0

I want to subsample rows of a dataframe such that all pairs of consecutive values in a given column are different, if 2 of them are the same, keep, say, the first one.

Here is an example

p = [1,1,2,1,3,3,2,4,3]
t = range(len(p))
df = pd.DataFrame({'t':t, 'p':p})

df

   p  t
0  1  0
1  1  1
2  2  2
3  1  3
4  3  4
5  3  5
6  2  6
7  4  7
8  3  8



desiredDf

   p  t
0  1  0
2  2  2
3  1  3
4  3  4
6  2  6
7  4  7
8  3  8

In desiredDf, all 2 consecutive values in the p column are different.

Baron Yugovich
  • 3,843
  • 12
  • 48
  • 76

1 Answers1

1

How about this?

>>> df[df.p != df.p.shift()]
   p  t
0  1  0
2  2  2
3  1  3
4  3  4
6  2  6
7  4  7
8  3  8

Explanation: df.p.shift() shifts the entries of column p down a row. df.p != df.p.shift() checks that each entry of df.p is different from the previous entry, returning a boolean value.

This method works on columns with any number of consecutive entries: e.g. if there is a run of three identical values, only the first value in that run is returned.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
  • Will that work if you have 3 consecutive identical values? – Baron Yugovich Oct 14 '14 at 16:21
  • @BaronYugovich yes - it could be generalised by using `&` and changing the shift value, e.g. `df[(df.p != df.p.shift(1)) & (df.p != df.p.shift(2))]` – Alex Riley Oct 14 '14 at 16:25
  • Please augment your answer then, before I accept it. Currently, it does not really address my question. What I'd like is that all pairs of consecutive values in the dataframe to be different, and your current answer does not achieve that. – Baron Yugovich Oct 14 '14 at 16:28
  • 2
    But isn't that what his answer does? By using the above, a dataframe is returned that basically doesn't have consecutive duplicates. His result also exactly matches your desired output. What is not achieved, exactly? – WGS Oct 14 '14 at 16:30
  • Actually yes, @Nanashi is correct... there's no need to use other shifts. The method returns only the first entry from a run of consecutive entries. – Alex Riley Oct 14 '14 at 16:37