3

I have a dataframe

A B C

1 2 3

2 3 4

3 8 7

I want to take only rows where there is a sequence of 3,4 in columns C (in this scenario - first two rows)

What will be the best way to do so?

oren_isp
  • 729
  • 1
  • 7
  • 22
  • 1
    @jezrael -- I don't think that's the right duplicate, OP is looking for a sequence not anywhere `in`? – Zero Sep 05 '18 at 10:50

2 Answers2

4

You can use rolling for general solution working with any pattern:

pat = np.asarray([3,4])
N = len(pat)

mask= (df['C'].rolling(window=N , min_periods=N)
              .apply(lambda x: (x==pat).all(), raw=True)
              .mask(lambda x: x == 0) 
              .bfill(limit=N-1)
              .fillna(0)
              .astype(bool))

df = df[mask]
print (df)
   A  B  C
0  1  2  3
1  2  3  4

Explanation:

  • use rolling.apply and test pattern
  • replace 0s to NaNs by mask
  • use bfill with limit for filling first NANs values by last previous one
  • fillna NaNs to 0
  • last cast to bool by astype
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
2

Use shift

In [1085]: s = df.eq(3).any(1) & df.shift(-1).eq(4).any(1)

In [1086]: df[s | s.shift()]
Out[1086]:
   A  B  C
0  1  2  3
1  2  3  4
Zero
  • 74,117
  • 18
  • 147
  • 154