0

I have a pandas df with three columns, purchase_day, customer_name, products_purchased.

I want to return an array of number of days that each customer visited the store. So I used

gpd = df.groupby(by=['customer_name', 'purchase_day']).count()

which returns a table that looks like: enter image description here

Unfortunately with this returned table, I can't run groupby on it because of the unusual format (where customer_name and purchase_days aren't in the first row but in the second).

Any tips so that I can count the number of purchase_days each customer visited the store?

Alex Fung
  • 1,996
  • 13
  • 21
itstoocold
  • 2,385
  • 2
  • 12
  • 15
  • Can you add the actual result as code instead of a drawing on a piece of paper please? I do appreciate the effort :P – miradulo Feb 21 '17 at 01:49

1 Answers1

1

What you need to do is resetting the index.

Since you apply .groupby the dataframe with multiple columns,

the dataframe returned would have a MultiIndex.

gpd = df.groupby(by=['customer_name', 'purchase_day']).count().reset_index()

Also you can still apply .groupby with multiindexed dataframe.

In the documentation of method pandas.DataFrame.groupby,

there is a para level which you can use to set multiindices as groupby columns.

There is a SO thread on this that you can check out here.

Community
  • 1
  • 1
Alex Fung
  • 1,996
  • 13
  • 21