0

How could i filter my df dataframe to only include data from April and May? Why does my last in statement fail? i want to filter data for April and May and copy it into another dataframe.

df = pd.DataFrame({'year': [2015, 2016],
                       'month': [4, 3],
                       'day': [4, 5]})
x=pd.to_datetime(df)
4 in (4,5)
x.dt.month in (4,5)
#y=x.dt.month in (4,5)
jpp
  • 159,742
  • 34
  • 281
  • 339
Ni_Tempe
  • 307
  • 1
  • 6
  • 20

1 Answers1

2

in checks whether the left operand is a member of the right operand, x.dt.month is a pandas series and not a member of (4,5); You need the vectorized isin method:

x.dt.month.isin((4,5))
#0    False
#1    False
#dtype: bool
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • 1
    1 second earlier, +1. Worth mentioning this is a good opportunity to use `set`. – jpp Jun 05 '18 at 17:02
  • @jpp could you share how to use `set`? – Ni_Tempe Jun 05 '18 at 17:09
  • 1
    `vals = {4, 5}; mask = pd.to_datetime(df).dt.month.isin(vals)` – jpp Jun 05 '18 at 17:10
  • 1
    @Psidom, I'd question how vectorised `pd.Series.isin` actually is.. Combining `set` with `numpy` array usually means it's not (but please correct me). – jpp Jun 05 '18 at 17:12
  • @jpp Good question but actually if you look at `isin` [source code](https://github.com/pandas-dev/pandas/blob/v0.23.0/pandas/core/series.py#L3530-L3588), the values are always converted to an array before using `np.in1d`, which further uses `np.argsort` and vectorized comparison to calculate the result. So for numeric type, I'd say it's vectorized to some extent. – Psidom Jun 05 '18 at 17:49