How to remove rows with multiple occurrences in a row with pandas

Question

i have this data:

i expect to get:

     A  
1    1 
-    -   -> (drop)
3    1  
4    2
5    2
6    1

I want to drop all the rows in col ['A'] with the same value that appear in a row, but without the first and the last ones.

Until now I used:

df = df.loc[df[col].shift() != df[col]]

but it will remove also the last appearance.

Sorry for my bad English, thanks in advance.

score 1 · Answer 1 · edited Jul 04 '21 at 11:08

1

The suggested solution is:

pd.concat([
    df['A'].drop_duplicates(keep='first'),
    df['A'].drop_duplicates(keep='last'),
])

Update after clarification:

First get the boolean masks for your described criteria:

is_last = df['A'] != df['A'].shift(-1)
is_duplicate = df['A'] == df['A'].shift()

And drop the rows based on these:

df.drop(df.index[~is_last & is_duplicate]) # note the ~ to negate is_last

edited Jul 04 '21 at 11:08

Mustafa Aydın

answered Jul 04 '21 at 10:34

Hagen

drop_duplicates will drop all the rows with the same value in all of the table, I need to drop only if the value appearance in a row. – Eylon Koenig Jul 04 '21 at 10:44
if I run the code output will be: A [1 , 1] [4 , 2] [5 , 2] [6 , 1] – Eylon Koenig Jul 04 '21 at 10:51

score 0 · Accepted Answer · answered Jul 04 '21 at 10:59

0

Basically you need to group consecutive numbers, which can be achieved by diff and cumsum:

print (df.groupby(df["A"].diff().ne(0).cumsum(), as_index=False).nth([0, -1]))

   A
1  1
3  1
4  2
5  2
6  1

answered Jul 04 '21 at 10:59

Henry Yik

2 Answers2