In this pandas dataframe:
df =
pos index data
21 36 a,b,c
21 36 a,b,c
23 36 c,d,e
25 36 f,g,h
27 36 g,h,k
29 39 a,b,c
29 39 a,b,c
31 39 .
35 39 c,k
36 41 g,h
38 41 k,l
39 41 j,k
39 41 j,k
I want to remove the repeated line that are only in the same index group and when they are in the head regions of the subframe.
So, I did:
df_grouped = df.groupby(['index'], as_index=True)
now,
for i, sub_frame in df_grouped:
subframe.apply(lamda g: ... remove one duplicate line in the head region if pos value is a repeat)
I want to apply this method because some pos
value will be repeated in the tail region which should not be removed.
Any suggestions.
Expected output:
pos index data
removed
21 36 a,b,c
23 36 c,d,e
25 36 f,g,h
27 36 g,h,k
removed
29 39 a,b,c
31 39 .
35 39 c,k
36 41 g,h
38 41 k,l
39 41 j,k
39 41 j,k