Pandas - consecutive values must be different

Question

I want to subsample rows of a dataframe such that all pairs of consecutive values in a given column are different, if 2 of them are the same, keep, say, the first one.

Here is an example

p = [1,1,2,1,3,3,2,4,3]
t = range(len(p))
df = pd.DataFrame({'t':t, 'p':p})

df

   p  t
0  1  0
1  1  1
2  2  2
3  1  3
4  3  4
5  3  5
6  2  6
7  4  7
8  3  8



desiredDf

   p  t
0  1  0
2  2  2
3  1  3
4  3  4
6  2  6
7  4  7
8  3  8

In desiredDf, all 2 consecutive values in the p column are different.

Alex Riley · Accepted Answer · 2014-10-14T16:38:45.390

1

How about this?

>>> df[df.p != df.p.shift()]
   p  t
0  1  0
2  2  2
3  1  3
4  3  4
6  2  6
7  4  7
8  3  8

Explanation: df.p.shift() shifts the entries of column p down a row. df.p != df.p.shift() checks that each entry of df.p is different from the previous entry, returning a boolean value.

This method works on columns with any number of consecutive entries: e.g. if there is a run of three identical values, only the first value in that run is returned.

edited Oct 14 '14 at 16:38

answered Oct 14 '14 at 16:18

Alex Riley

169,130
45
262
238

Will that work if you have 3 consecutive identical values? – Baron Yugovich Oct 14 '14 at 16:21
@BaronYugovich yes - it could be generalised by using `&` and changing the shift value, e.g. `df[(df.p != df.p.shift(1)) & (df.p != df.p.shift(2))]` – Alex Riley Oct 14 '14 at 16:25
Please augment your answer then, before I accept it. Currently, it does not really address my question. What I'd like is that all pairs of consecutive values in the dataframe to be different, and your current answer does not achieve that. – Baron Yugovich Oct 14 '14 at 16:28
2

But isn't that what his answer does? By using the above, a dataframe is returned that basically doesn't have consecutive duplicates. His result also exactly matches your desired output. What is not achieved, exactly? – WGS Oct 14 '14 at 16:30
Actually yes, @Nanashi is correct... there's no need to use other shifts. The method returns only the first entry from a run of consecutive entries. – Alex Riley Oct 14 '14 at 16:37

Pandas - consecutive values must be different

1 Answers1