5

I have a pandas dataframe, I want to check for each row if it has the same value at a particular column(let's call it porduct_type), and if it does, delete it. In other words, out of a group of consecutive rows with the same value at a particular column, I want to keep only one.

Example, if column A is the one on which we don't want consecutive duplicates:

input =  
A    B

    0  1    1
    0  2    2
    2  1   10
    2  2   20
    0  11  100
    5  2  200

output =  
A    B

    0  1    1
    2  1   10
    0  11  100
    5  2  200
Baron Yugovich
  • 3,843
  • 12
  • 48
  • 76
  • See related: http://stackoverflow.com/questions/19463985/pandas-drop-consecutive-duplicates/19464054#19464054 – EdChum Jul 25 '14 at 07:11

1 Answers1

5

It's a little tricky, but you could do something like

>>> df.groupby((df["A"] != df["A"].shift()).cumsum().values).first()
   A   B    C
1  0   1    1
2  2   1   10
3  0  11  100
4  5   2  200
DSM
  • 342,061
  • 65
  • 592
  • 494
  • 1
    How about this df = df[df['A'] != df.shift(-1)['A']] – Baron Yugovich Jul 24 '14 at 21:58
  • 1
    @BaronYugovich I would rather do `df = df[df['A'] != df['A'].shift(-1)]` - first `['A']` then `shift(-1)` to shift only one column not all `df`. – furas Jul 24 '14 at 22:15
  • 2
    Does this solution only remove one consecutive duplicate? What if there are more than two consecutive rows with the same value in A? – panc Feb 21 '17 at 20:33