How to find the first consecutive value in a Pandas dataframe column and delete that row?

Question

I have a pandas dataframe with multiple columns and rows. I wish to find the consecutive duplicate values in a particular column and delete the entire row of the first occurrence of that duplicate value.

I found a possible solution but it works only with pandas series. a.loc[a.shift() != a] This is the link to the mentioned solution

To visualize my dataframe would be something like this:

Index column0 column1 column2 column3
row0 0.5 25 26 27
row1 0.5 30 31 32
row2 1.0 35 36 37
row3 1.5 40 41 42

Index column0 column1 column2 column3
row1 0.5 30 31 32
row2 1.0 35 36 37
row3 1.5 40 41 42

This would be expected result with the row0 deleted.

P.S This duplicate occurrence does not happen at the beginning in my data, it occurs in random in the column0.

`df.loc[df.column0.shift(-1) != df.column0]`? – Quang Hoang Jul 05 '19 at 14:38 — Quang Hoang, Jul 05 '19 at 14:38
@QuangHoang That worked! Thank you! I'm such a noob. :D – Black_Pulse Jul 05 '19 at 14:45 — Black_Pulse, Jul 05 '19 at 14:45

score 2 · Accepted Answer · answered Jul 05 '19 at 14:48

2

df.loc[df.iloc[:, 0].shift(-1) != df.iloc[:, 0]]

This is the answer! Thank you Quang Hoang!

answered Jul 05 '19 at 14:48

Black_Pulse

55
8

score 0 · Answer 2 · answered Jul 05 '19 at 18:14

A step by step solution is here.

import pandas as pd
import numpy as np    

df = pd.DataFrame(np.random.randint(0,7,size=(10, 4)), columns=list('ABCD'))    

number_of_occurrence_on_first_column = df.groupby('A')['A'].count()    

has_duplicates_items = number_of_occurrence_on_first_column[number_of_occurrence_on_first_column >1].index    

all_duplicate_items = df[df.A.isin(has_duplicates_items)]    

need_to_delete = pd.DataFrame(all_duplicate_items['A']).drop_duplicates().index
df = df.drop(need_to_delete)

How to find the first consecutive value in a Pandas dataframe column and delete that row?

2 Answers2