0

My question is regarding a groupby of pandas dataframe. A sample dataset would look like this:

cust_id | date       | category
A0001   | 20/02/2016 | cat1
A0001   | 24/02/2016 | cat2
A0001   | 02/03/2016 | cat3
A0002   | 03/04/2015 | cat2

Now I want to groupby cust_id and then find events that occur within 30days of each other and compile the list of categories for those. What I have figured so far is to use pd.grouper in the following manner.

df.groupby(['cust_id', pd.Grouper(key='date', freq='30D')])['category'].apply(list)

But this isn't putting [cat1, cat2, cat3] in the same list for A0001. Any help on what I'm doing wrong or how I can go about doing what I need would be most appreciated.

The results I want should look something like this:

A0001 | [cat1, cat2, cat3]
A0002 | [cat2]

Thanks in Advance

Edit:

Following Wen's answer, I tried and it worked for this minimum example, my bad for providing a minimum example that wasn't representative. This can be recreated with this example for both 0.20.3 and 0.23.0 versions of pandas.

cust_id date    category
0   A0001   2015-02-02  cat5
1   A0002   2015-02-03  cat1
2   A0001   2016-02-20  cat1
3   A0001   2016-02-24  cat2
4   A0001   2016-03-02  cat3
5   A0003   2016-09-09  cat2
6   A0003   2016-08-21  cat5

The answer I get is:

cust_id
A0001          [cat5]
A0001    [cat1, cat2]
A0001          [cat3]
A0002          [cat1]
A0003          [cat5]
Name: category, dtype: object

My apologies for the initial confusion!

words_of_wisdom
  • 163
  • 3
  • 16

1 Answers1

1

You code is work for me

df.date=pd.to_datetime(df.date)
df.groupby(['cust_id', pd.Grouper(key='date', freq='30D')])['category'].apply(list).reset_index(level=1,drop=True)
Out[215]: 
cust_id
A0001       [ cat1,  cat2,  cat3]
A0002                     [ cat2]
Name: category, dtype: object
BENY
  • 317,841
  • 20
  • 164
  • 234
  • thanks for the answer. This is really weird, I tried it twice, once without resetting the index and once with resetting, neither gave me the answer I wanted, it keeps giving me [cat1, cat2] and then [cat3] – words_of_wisdom Sep 25 '18 at 03:25
  • so I kinda checked it on my other laptop and it worked, checked the pandas versions, it worked in '0.20.3' and didn't work in '0.23.0', wonder why as I can't find anything indicating a change to pd.Grouper :( – words_of_wisdom Sep 25 '18 at 03:45
  • @words_of_wisdom you may change to time grouper – BENY Sep 25 '18 at 12:33
  • would you be able to provide an example please? – words_of_wisdom Sep 25 '18 at 12:59
  • @words_of_wisdom iam at pd.__version__ Out[1232]: '0.22.0' – BENY Sep 25 '18 at 15:34