0

i have this data:

     A  
1    1 
2    1 
3    1  
4    2
5    2
6    1

i expect to get:

     A  
1    1 
-    -   -> (drop)
3    1  
4    2
5    2
6    1

I want to drop all the rows in col ['A'] with the same value that appear in a row, but without the first and the last ones.

Until now I used:

df = df.loc[df[col].shift() != df[col]]

but it will remove also the last appearance.

Sorry for my bad English, thanks in advance.

Eylon Koenig
  • 13
  • 1
  • 4

2 Answers2

1

Looks like you have the same problem as this question: Pandas drop_duplicates. Keep first AND last. Is it possible?.

The suggested solution is:

pd.concat([
    df['A'].drop_duplicates(keep='first'),
    df['A'].drop_duplicates(keep='last'),
])

Update after clarification:

First get the boolean masks for your described criteria:

is_last = df['A'] != df['A'].shift(-1)
is_duplicate = df['A'] == df['A'].shift()

And drop the rows based on these:

df.drop(df.index[~is_last & is_duplicate]) # note the ~ to negate is_last 
Mustafa Aydın
  • 17,645
  • 4
  • 15
  • 38
Hagen
  • 56
  • 6
0

Basically you need to group consecutive numbers, which can be achieved by diff and cumsum:

print (df.groupby(df["A"].diff().ne(0).cumsum(), as_index=False).nth([0, -1]))

   A
1  1
3  1
4  2
5  2
6  1
Henry Yik
  • 22,275
  • 4
  • 18
  • 40