2

I have a DataFrame df filled with rows and columns where there are duplicate Id's:

Index   A   B   
0   0.00    0.00    
1   0.00    0.00    
29  0.50    105.00  
36  0.80    167.00  
37  0.80    167.00  
42  1.00    209.00  
44  0.50    105.00  
45  0.50    105.00  
46  0.50    105.00  
50  0.00    0.00    
51  0.00    0.00    
52  0.00    0.00    
53  0.00    0.00    

When I use:

    df.drop_duplicates(subset=['A'], keep='last')

I get:

Index   A   B   
37  0.80    167.00  
42  1.00    209.00  
46  0.50    105.00
53  0.00    0.00

Which makes sense, that's what the function does. However, what I actually would like to achieve is something like:

Index   A   B   
1   0.00    0.00    
29  0.50    105.00  
37  0.80    167.00  
42  1.00    209.00  
46  0.50    105.00  
53  0.00    0.00    

Basically from each subpart of column A (0,0), (0.80, 0.80), etc. To pick the last value.

It is also important the values in column A stay in this order 0; 0.5; 0.8; 1; 0.5;0 and they do not get mixed.

Adrian
  • 85
  • 1
  • 6

1 Answers1

0

Compare by not equal Series.ne with Series.shift and filter by boolean indexing:

df1 = df[df['A'].ne(df['A'].shift(-1))]
print (df1)
         A      B
Index            
1      0.0    0.0
29     0.5  105.0
37     0.8  167.0
42     1.0  209.0
46     0.5  105.0
53     0.0    0.0

Details:

print (df['A'].ne(df['A'].shift(-1)))
Index
0     False
1      True
29     True
36    False
37     True
42     True
44    False
45    False
46     True
50    False
51    False
52    False
53     True
Name: A, dtype: bool
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252