0

I'm facing a problem with filtered dataframe and groupby
Say I have this dataframe

    id     product   date
0   220    6647     2015-09-01 
1   220    6647     2014-09-03 
2   220    6647     2014-10-16
3   826    3380     2014-11-11
4   826    3380     2015-12-09
5   826    3380     2015-05-19
6   901    4555     2015-09-01
7   901    4555     2014-10-05
8   901    4555     2014-11-01

I'd like to first select rows of year 2015 and then groupby id and get the latest by date

I've read this article, that works great on the entire df
but it seems it's not working if I first try to filter the df like this

my_date = datetime.datetime(2014, 12, 31)
df = df[df.date>my_date]

now if I run the following code

df.loc[df.groupby('id').date.idxmax()]

it gives my the following error

attempt to get argmax of an empty sequence

Any help would be appreciated :) thanks

Fabio
  • 1
  • 1

2 Answers2

0

In df = df[date>my_date] you have only date, not df.date, so that's probably causing the error.

This code:

import pandas as pd
from io import StringIO
from datetime import datetime

txt = '''id     product   date
220    6647     2015-09-01
220    6647     2014-09-03
220    6647     2014-10-16
826    3380     2014-11-11
826    3380     2015-12-09
826    3380     2015-05-19
901    4555     2015-09-01
901    4555     2014-10-05
901    4555     2014-11-01'''

df = pd.read_fwf(StringIO(txt))
df['date'] = pd.to_datetime(df['date']) # convert date to datetime

my_date = datetime(2014, 12, 31)
df = df[df.date>my_date]

print(df.loc[df.groupby('id').date.idxmax()])

Prints:

    id  product       date
0  220     6647 2015-09-01
4  826     3380 2015-12-09
6  901     4555 2015-09-01
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

ok, I got it
the example I posted is taken from another article and it works perfect
my own example is a bit different
my dataframe groupby item is taype category
if I leave it as object it works

Fabio
  • 1
  • 1