Enumerate rows for each dtaaframe group based on conditions

Question

I would like to reenumerate rows in given df using some conditions. My question is an extension of this question.

Example of df:

   ind  seq  status 
0   1   2    up
1   1   3    mid
2   1   5    down
3   2   1    up
4   2   2    mid
5   2   3    down
6   3   1    up
7   3   2    mid
8   3   3    oth

The df contains ind column which represents a group. The seq column might have some bad data. That's way I would like to add another column seq_corr to correct the seq enumerating based on some conditions:

the first value in a group in status column equals up
the last value in a group in status column equals down OR oth
in all other cases copy number from seq column.

I know the logical way to do this but I have some troubles how to convert it to Python. Especially when it comes to proper slicing and accessing the first and the last element of each group.

Below you can find my not working code:

 def new_id(x):
    if (x.loc['status',0] == 'up') and ((x.loc['status',-1]=='down') or (x['status',-1]=='oth')):
        x['ind_corr'] = np.arange(1, len(x) + 1)
    else:
        x['seq_corr']= x['seq']
    return x

 df.groupby('ind', as_index=False).apply(new_id)

Expected result:

   ind  seq  status  seq_corr
0   1   2    up       1
1   1   3    mid      2
2   1   5    down     3
3   2   1    up       1
4   2   2    mid      2
5   2   3    down     3
6   3   5    up       1
7   3   2    mid      2
8   3   7    oth      3

Hoping that someone would be able to point me out any solution.

score 2 · Accepted Answer · edited Sep 16 '17 at 11:28

2

Let's try df.groupby followed by an apply and concatenation.

vals = df.groupby('ind').apply(
       lambda g: np.where(g['status'].iloc[0] == 'up' 
                       or g['status'].iloc[-1] in {'down', 'oth'},
      np.arange(1, len(g) + 1), g['seq'])
).values

df['seq_corr'] = np.concatenate(vals)

df
   ind  seq status  seq_corr
0    1    2     up         1
1    1    3    mid         2
2    1    5   down         3
3    2    1     up         1
4    2    2    mid         2
5    2    3   down         3
6    3    1     up         1
7    3    2    mid         2
8    3    3    oth         3

edited Sep 16 '17 at 11:28

Bharath M Shetty

30,075
6
57
108

answered Sep 16 '17 at 06:10

cs95

379,657
97
704
746

Thank you it works very nice ! Just one question. What should I do if I want to copy `seq` number if the `status` does not meet any condition ? – Michal Sep 16 '17 at 10:00
@Michal its simple replace `or` with `and`. – Bharath M Shetty Sep 16 '17 at 11:09
@cᴏʟᴅsᴘᴇᴇᴅ I did a bit of editing to your answer. You can edit it in your words. – Bharath M Shetty Sep 16 '17 at 11:13
@Bharathshetty the boolean expressions are all right. There must stay or. This code reenumerates all seq numbers. – Michal Sep 16 '17 at 11:23
@Michal will my answer help? – Bharath M Shetty Sep 16 '17 at 12:05

score 1 · Answer 2 · answered Sep 16 '17 at 12:04

Another method with groupby cumcount. To select the first and last row we can use head and tail method and union their indices. I think this one might help with your second question

df['seq_corr'] = df.groupby('ind').cumcount()+1
idx = df.groupby('ind').head(1).index.union(df.groupby('ind').tail(1).index)

df.loc[idx,'seq_corr'] = np.where(~df.loc[idx,'status'].isin(['up','down','oth']),
                                    df.loc[idx,'seq'],df.loc[idx,'seq_corr'])

Sample Output:

   ind  seq status  seq_corr
0    1    2     up         1
1    1    3    mid         2
2    1    5  dance         5
3    2    1     up         1
4    2    2    mid         2
5    2    3   down         3
6    3    1     up         1
7    3    2    mid         2
8    3    3    oth         3

Thank you @Bharath shetty. Does `isin([...])` takes into account all conditions with `AND` statement ? My idea was to also have `OR`. But anyway I have merged @cᴏʟᴅsᴘᴇᴇᴅ answer with my idea and it looks like everything works fine. I will post the solution below. — Michal, Sep 16 '17 at 14:16
This will work fine based on your `OR` condition, since we are taking first and last row of grouped data. This was the desired output right? Any problem you faced? — Bharath M Shetty, Sep 16 '17 at 19:09

score 0 · Answer 3 · answered Sep 16 '17 at 14:21

Thanks to @cᴏʟᴅsᴘᴇᴇᴅ I have corrected my code. Looking at first tests, everything is ok.

 def new_id(x):
    if (x['status'].iloc[0] == 'up') and ((x['status'].iloc[-1]=='down') or (x['status'].iloc[-1]=='oth')):
    x['seq_corr'] = np.arange(1, len(x) + 1)
    else:
    x['seq_corr']= x['seq']
    return x

Enumerate rows for each dtaaframe group based on conditions

3 Answers3