2

Given a dataframe as follows:

  city district        date  price
0   bj       cy  2019-03-01    NaN
1   bj       cy  2019-04-01    6.0
2   sh       hp  2019-03-01    4.0
3   sh       hp  2019-04-01    3.0
4   bj       hd  2019-03-01    7.0
5   bj       hd  2019-04-01    NaN

I need to filter grouped rows of city and district when both of the following conditions were met: date is 2019-04-01 and price is NaN.

I have tested with the following code:

df['date'] = pd.to_datetime(df['date']).dt.date.astype(str)
df.groupby(['city','district']).filter(lambda x: (x['price'].isnull() & x['date'].isin(['2019-04-01'])).any())

Out:

  city district        date  price
4   bj       hd  2019-03-01    7.0
5   bj       hd  2019-04-01    NaN

Another test:

df.groupby(['city','district']).filter(lambda x: (x['price'].isnull() & x['date']).any())

Out:

  city district        date  price
0   bj       cy  2019-03-01    NaN
1   bj       cy  2019-04-01    6.0
4   bj       hd  2019-03-01    7.0
5   bj       hd  2019-04-01    NaN

But I need is as below. How could I modify the code above? Thanks a lot.

  city district      date  price
0   bj       cy  2019/3/1    NaN
1   bj       cy  2019/4/1    6.0
2   sh       hp  2019/3/1    4.0
3   sh       hp  2019/4/1    3.0
ah bon
  • 9,293
  • 12
  • 65
  • 148
  • In your desired output, I don't understand why rows 2 and 3 should be there. Didn't you say you want to filter for `price is NaN` (at least once per group, I suppose)? – Arne Apr 23 '20 at 11:19
  • It needs to `date` is `2019-04-01` and `price` is `NaN`. – ah bon Apr 23 '20 at 11:21
  • But for the group defined by `sh, hp`, `price` is `4.0` and `3.0`, never `NaN`, so why is this group not filtered out? – Arne Apr 23 '20 at 11:38
  • If we put in another way, I want to drop the groups of rows for `date` is `2019-04-01` and its `price` is `NaN`. – ah bon Apr 23 '20 at 11:42
  • Thanks, I understand now. So jezrael's answer is spot on. – Arne Apr 23 '20 at 11:51

1 Answers1

1

I think you need invert mask - here & to |, isnull to notna, eq to ne and any to all:

df['date'] = pd.to_datetime(df['date'])

f = lambda x: (x['price'].notna() | x['date'].ne('2019-04-01')).all()
df = df.groupby(['city','district']).filter(f)
print (df)
  city district       date  price
0   bj       cy 2019-03-01    NaN
1   bj       cy 2019-04-01    6.0
2   sh       hp 2019-03-01    4.0
3   sh       hp 2019-04-01    3.0

Or is possible use not for invert boolean True to False and False to True:

f = lambda x: not (x['price'].isnull() & x['date'].eq('2019-04-01')).any()
df = df.groupby(['city','district']).filter(f)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • It works with test data, but with real data, it only filter all rows `date` is `2019-04-01`, i'm confused. – ah bon Apr 23 '20 at 11:35
  • @ahbon - My solution drop groups with `df[df['price'].isnull() & df['date'].eq('2019-04-01')]` rows, if exist at least one. Not sure if need it. – jezrael Apr 23 '20 at 11:38
  • I hope to drop the groups of rows for `date` is `2019-04-01` and its `price` is `NaN`. – ah bon Apr 23 '20 at 11:44
  • @ahbon - For test is possible use `df[df['price'].isnull() & df['date'].eq('2019-04-01')].drop_duplicates('date')` for groups which are removed (by `'city','district'` columns) – jezrael Apr 23 '20 at 11:46
  • I think maybe the problem is come from there are some `NaN`s from columns used to groupby, such as `city` or `district`? – ah bon Apr 23 '20 at 12:00
  • @ahbon - if there are NaNs then groups are removed. I try test it. – jezrael Apr 23 '20 at 12:02
  • @ahbon - Is possible test like `df = df.assign(city = lambda x: x['city'].fillna(-1),district = lambda x: x['district'].fillna(-1)).groupby(['city','district']).filter(f)`, [link](https://stackoverflow.com/questions/18429491/groupby-columns-with-nan-missing-values) ? – jezrael Apr 23 '20 at 12:12