Pandas DataFrame - delete rows that have same value at a particular column as a previous row

Question

I have a pandas dataframe, I want to check for each row if it has the same value at a particular column(let's call it porduct_type), and if it does, delete it. In other words, out of a group of consecutive rows with the same value at a particular column, I want to keep only one.

Example, if column A is the one on which we don't want consecutive duplicates:

See related: http://stackoverflow.com/questions/19463985/pandas-drop-consecutive-duplicates/19464054#19464054 — EdChum, Jul 25 '14 at 07:11

score 5 · Accepted Answer · answered Jul 24 '14 at 21:52

5

It's a little tricky, but you could do something like

>>> df.groupby((df["A"] != df["A"].shift()).cumsum().values).first()
   A   B    C
1  0   1    1
2  2   1   10
3  0  11  100
4  5   2  200

answered Jul 24 '14 at 21:52

DSM

342,061
65
592
494

1

How about this df = df[df['A'] != df.shift(-1)['A']] – Baron Yugovich Jul 24 '14 at 21:58
1

@BaronYugovich I would rather do `df = df[df['A'] != df['A'].shift(-1)]` - first `['A']` then `shift(-1)` to shift only one column not all `df`. – furas Jul 24 '14 at 22:15
2

Does this solution only remove one consecutive duplicate? What if there are more than two consecutive rows with the same value in A? – panc Feb 21 '17 at 20:33

Pandas DataFrame - delete rows that have same value at a particular column as a previous row

1 Answers1