I would like to reenumerate rows in given df
using some conditions. My question is an extension of this question.
Example of df
:
ind seq status
0 1 2 up
1 1 3 mid
2 1 5 down
3 2 1 up
4 2 2 mid
5 2 3 down
6 3 1 up
7 3 2 mid
8 3 3 oth
The df
contains ind
column which represents a group. The seq
column might have some bad data. That's way I would like to add another column seq_corr
to correct the seq
enumerating based on some conditions:
- the first value in a group in
status
column equalsup
- the last value in a group in
status
column equalsdown
ORoth
- in all other cases copy number from
seq
column.
I know the logical way to do this but I have some troubles how to convert it to Python
. Especially when it comes to proper slicing and accessing the first and the last element of each group.
Below you can find my not working code:
def new_id(x):
if (x.loc['status',0] == 'up') and ((x.loc['status',-1]=='down') or (x['status',-1]=='oth')):
x['ind_corr'] = np.arange(1, len(x) + 1)
else:
x['seq_corr']= x['seq']
return x
df.groupby('ind', as_index=False).apply(new_id)
Expected result:
ind seq status seq_corr
0 1 2 up 1
1 1 3 mid 2
2 1 5 down 3
3 2 1 up 1
4 2 2 mid 2
5 2 3 down 3
6 3 5 up 1
7 3 2 mid 2
8 3 7 oth 3
Hoping that someone would be able to point me out any solution.