0

first of all, this post is very useful: How to pivot a dataframe

Right now I have the following target:

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
...                          "bar", "bar", "bar", "bar"],
...                    "B": ["one", "one", "one", "two", "two",
...                          "one", "one", "two", "two"],
...                    "C": ["small", "large", "large", "small",
...                          "small", "large", "small", "small",
...                          "large"],
...                    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
...                    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
...                     aggfunc={'D': np.mean,
...                              'E': ['count', max, np.mean]})
flattened = pd.DataFrame(table.to_records())

with the outcome and target:

    A   C   ('D', 'mean')   ('E', 'count')  ('E', 'max')    ('E', 'mean')
0   bar     large   5.500000    2.0     9.0     7.500000
1   bar     small   5.500000    2.0     9.0     8.500000
2   foo     large   2.000000    2.0     5.0     4.500000
3   foo     small   2.333333    3.0     6.0     4.333333

Is there any aquivalent command with groupby? Something like: df.groupby(['row', 'col'])['val0'].agg(['size', 'mean']).unstack(fill_value=0)

Or this already the most efficient way to do it?

Tobias M.
  • 126
  • 1
  • 8
PV8
  • 5,799
  • 7
  • 43
  • 87

1 Answers1

2

Groupby alternative is:

df = df.groupby(['A', 'C']).agg({'D': np.mean, 'E': ['count', max, np.mean]})
print (df)
                  D     E              
               mean count max      mean
A   C                                  
bar large  5.500000     2   9  7.500000
    small  5.500000     2   9  8.500000
foo large  2.000000     2   5  4.500000
    small  2.333333     3   6  4.333333


df.columns = df.columns.map('_'.join)
print (df)
             D_mean  E_count  E_max    E_mean
A   C                                        
bar large  5.500000        2      9  7.500000
    small  5.500000        2      9  8.500000
foo large  2.000000        2      5  4.500000
    small  2.333333        3      6  4.333333
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • possible to run it in one line? and do you think it is faster? – PV8 Aug 23 '19 at 12:11
  • 1
    @PV8 - Unfortunately need second line if need flatten `MultiIndex`. It should be faster, but it depends of number of rows, number of unique values per groups. So the best test in real data. – jezrael Aug 23 '19 at 12:12
  • 1
    `df.columns.map` is a pretty neat way to flatten multiindex columns. cool – Laurens Koppenol Aug 23 '19 at 12:13