0

I'm pretty new to Python and I just encountered a problem.

mini_agg is my original pandas.dataframe and I'm trying to group it by 2 columns.

trial = mini_agg.groupby(['date','product','product_type_1','product_type_2','product_type_3','product_type_4']).sum()

print mini_agg.shape
print trial.shape

output:

(2965909, 10)
(499281, 4)

Furthermore I cannot access the keys by which I grouped by. In R I do obtain my column back when using aggregate.

Can you please help me? Thank you in advance

Kian
  • 1,319
  • 1
  • 13
  • 23
Tommaso Guerrini
  • 1,499
  • 5
  • 17
  • 33

2 Answers2

1

How to GroupBy a Dataframe in Pandas and keep Columns

Just found the answer I didn't find with my previous queries:

trial = mini_agg.groupby(['date','product','product_type_1','product_type_2','product_type_3','product_type_4']).sum().reset_index()

It is sufficient to add .reset_index()

Community
  • 1
  • 1
Tommaso Guerrini
  • 1,499
  • 5
  • 17
  • 33
1

I expected mini_agg values to be provided however I suppose it's a combination of two one-dimensional labeled data structures. So as you mentioned mini_agg is a pandas.dataframe and as you must know DataFrame Like Series has a possibility to accept another DataFrame as input:

Therefore, If mini_agg to be like:

import pandas as pd
FRAME= {'one' : pd.Series([1., 2., 3.], index=['product_type_1', 'product_type_2', 'product_type_3']),
'two' : pd.Series([1., 2., 3., 4.], index=['product_type_1', 'product_type_2', 'product_type_3', 'product_type_4'])}
mini_agg = pd.DataFrame(FRAME)

So,

trial = pd.DataFrame(mini_agg, index=['date','product','product_type_1','product_type_2','product_type_3','product_type_4'], columns=['A', 'B', 'C', 'D', 'E', 'F'])
Kian
  • 1,319
  • 1
  • 13
  • 23