1

I have the following dataframe, from which I need to drop consecutive duplicate values only if they equal 0.3 or 0.4.

In [2]: df = pd.DataFrame(index=pd.date_range('20020101', periods=7, freq='D'),
                              data={'poll_support': [0.3, 0.4, 0.4, 0.4, 0.3 0.5 0.5]})
    
In [3]: df
Out[3]:
                poll_support
2002-01-01           0.3
2002-01-02           0.4
2002-01-03           0.4
2002-01-04           0.4
2002-01-05           0.3
2002-01-06           0.5
2002-01-07           0.5

I need the df to look like this:

2002-01-01           0.3
2002-01-02           0.4
2002-01-05           0.3
2002-01-06           0.5
2002-01-07           0.5

I tried:

for var in df['poll_support']:
    if var == 0.3 or var == 0.4:
        df['poll_support']= df['poll_support'].loc[df['poll_support'].shift() != 0.3]
        df['poll_support']= df['poll_support'].loc[df['poll_support'].shift() != 0.4]

However, this does not produce the desired df.

I would love to hear suggestions.

TheFaultInOurStars
  • 3,464
  • 1
  • 8
  • 29
arkadiy
  • 746
  • 1
  • 10
  • 26
  • Does this answer your question? [Pandas: Drop consecutive duplicates](https://stackoverflow.com/questions/19463985/pandas-drop-consecutive-duplicates) – TheFaultInOurStars Feb 27 '22 at 21:04
  • @AmirhosseinKiani No, that solution does not account for duplicates of a specific type- it tackles all consecutive duplicates – arkadiy Feb 27 '22 at 21:06

1 Answers1

1

Boolean indexing will help. Try:

df[~((df['poll_support']==df['poll_support'].shift())&(df['poll_support'].isin([0.3,0.4])))]




             poll_support
2002-01-01           0.3
2002-01-02           0.4
2002-01-05           0.3
2002-01-06           0.5
2002-01-07           0.5
wwnde
  • 26,119
  • 6
  • 18
  • 32