0

I am looking to summarize the following table, df1, pandas data frame:

  name subname  value1  value2  value3
0    a       b       1       1       1
1    a       b       2       2       2
2    a       b       3       3       3
3    a       b       4       4       4
4    a       b       5       5       5

to obtain in df2 format:

  name subname    type  mean  sd
0    a       b  value1     3   3
1    a       b  value2     3   3
2    a       b  value3     3   3

Reproducible input:

df1 = pd.DataFrame({'name': ['a','a','a','a','a'],
                    'subname': ['b','b','b','b','b'],
                    'value1': [1,2,3,4,5],
                    'value2': [1,2,3,4,5],
                    'value3': [1,2,3,4,5]})

df2 = pd.DataFrame({'name': ['a','a','a'],
                    'subname': ['b','b','b'],
                    'type': ['value1','value2','value3'],
                    'mean': [3,3,3],
                    'sd': [3,3,3]})
mozway
  • 194,879
  • 13
  • 39
  • 75
Tamas
  • 109
  • 8

2 Answers2

2

Use GroupBy.agg with DataFrame.stack by first level:

out = (df1.groupby(['name','subname'])
          .agg(['mean','std'])
          .rename_axis(['type', None], axis=1)
          .stack(0)
          .reset_index()
         )
print (out)
  name subname    type  mean       std
0    a       b  value1   3.0  1.581139
1    a       b  value2   3.0  1.581139
2    a       b  value3   3.0  1.581139

Another solution with DataFrame.pivot_table and stack:

out = (df1.pivot_table(index=['name','subname'],  aggfunc=['mean','std'])
          .rename_axis([None,'type'], axis=1)
          .stack()
          .reset_index())
print (out)
  name subname    type  mean       std
0    a       b  value1     3  1.581139
1    a       b  value2     3  1.581139
2    a       b  value3     3  1.581139
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

You can melt and groupby.agg:

out = (df1.melt(['name', 'subname'], var_name='type')
          .groupby(['name', 'subname', 'type'], as_index=False)['value']
          .agg(['mean', 'std'])
       )

Alternatively, with a pivot_table and stack:

out = (df1.pivot_table(index=['name', 'subname'], aggfunc=['mean', 'std'])
          .rename_axis(columns=[None, 'type']).stack().reset_index()
      )

Output:

  name subname    type  mean       std
0    a       b  value1   3.0  1.581139
1    a       b  value2   3.0  1.581139
2    a       b  value3   3.0  1.581139
mozway
  • 194,879
  • 13
  • 39
  • 75