Splitting pandas dataframe based on value

Question

I would like to split pandas dataframe to groups in order to process each group separately. My 'value.csv' file contains the following numbers

num tID y x height width
2   0   0   0   1   16
2   1   1   0   1   16 
5   0   1   0   1   16 
5   1   0   0   1   8 
5   2   0   8   1   8 
6   0   0   0   1   16 
6   1   1   0   1   8 
6   2   1   8   1   8
2   0   0   0   1   16
2   1   1   0   1   16 
5   0   1   0   1   16 
5   1   0   0   1   8 
5   2   0   8   1   8 
6   0   0   0   1   16 
6   1   1   0   1   8 
6   2   1   8   1   8

I would like to split the data based on the starting value of 0 at the tID column like that for the first 4 seperation.

First:

2   0   0   0   1   16
2   1   1   0   1   16

Second:

5   0   1   0   1   16 
5   1   0   0   1   8 
5   2   0   8   1   8

Third:

6   0   0   0   1   16 
6   1   1   0   1   8 
6   2   1   8   1   8

Fourth:

2   0   0   0   1   16
2   1   1   0   1   16

For this, I tried to split it using if but no success, any efficient ideas?

    import pandas as pd
    statQuality = 'value.csv'
    df = pd.read_csv(statQuality, names=['num','tID','y','x','height','width'])


    df2 = df.copy()
    df2.drop(['num'], axis=1, inplace=True)

    x = []

    for index, row in df2.iterrows():
        if row['tID'] == 0:
            x = []
            x.append(row)
            print(x)
        else:
            x.append(row)

@jezrael Thanks good idea, but, another .csv file might contain repeated values for num. So it's not a good selector. — Cagri, Jul 09 '18 at 08:07
So need this solution, only is necessqry remove `num` column in each DataFrame? — jezrael, Jul 09 '18 at 08:08
Do you need `d = {k: v.drop('num', axis=1) for k, v in df.groupby('num')}` ? — jezrael, Jul 09 '18 at 08:09
@jezrael thanks yes it is necessary to remove num because it does not have any function to split document. Other documents might have multiple values such as 0, 2, etc for num — Cagri, Jul 09 '18 at 08:13
So `d = {k: v.drop('num', axis=1) for k, v in df.groupby('num')}` working nice? — jezrael, Jul 09 '18 at 08:18
@jezrael the above example, yes it works nice. But when we have some duplicated values for num, it does not work nicely. — Cagri, Jul 09 '18 at 08:20
Do you think last `6` insample data should be `2`, so need 2 different 2 groups? — jezrael, Jul 09 '18 at 08:22
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/174635/discussion-between-cagri-and-jezrael). — Cagri, Jul 09 '18 at 08:25

score 1 · Accepted Answer · answered Jul 09 '18 at 08:43

Use:

#create groups by consecutive values
s = df['num'].ne(df['num'].shift()).cumsum()
#create helper count Series for duplicated groups like `2_0`, `2_1`...
g = s.groupby(df['num']).transform(lambda x: x.factorize()[0])
#dictionary of DataFrames
d = {'{}_{}'.format(i,j): v.drop('num', axis=1) for (i, j), v in df.groupby(['num', g])}
print (d)
{'2_0':    tID  y  x  height  width
0    0  0  0       1     16
1    1  1  0       1     16, '2_1':    tID  y  x  height  width
8    0  0  0       1     16
9    1  1  0       1     16, '5_0':    tID  y  x  height  width
2    0  1  0       1     16
3    1  0  0       1      8
4    2  0  8       1      8, '5_1':     tID  y  x  height  width
10    0  1  0       1     16
11    1  0  0       1      8
12    2  0  8       1      8, '6_0':    tID  y  x  height  width
5    0  0  0       1     16
6    1  1  0       1      8
7    2  1  8       1      8, '6_1':     tID  y  x  height  width
13    0  0  0       1     16
14    1  1  0       1      8
15    2  1  8       1      8}

Splitting pandas dataframe based on value

1 Answers1