Pandas mean and mode particular columns in DataFrame

Question

I have a large DataFrame along the lines of the one below, and I am looking to calculate either mean or mode on particular columns. I am looking to groupby a feature (whether it is on or off) and also an ID column (1 or 2).

MY_DF

        Feature ID  ITEM1   ITEM2   ITEM3
index
1       ON      1   2.3     544     32.2
2       OFF     1   2.55    544     23.4
3       ON      2   8.32    674     15.5
4       ON      1   1.56    544     43.4
5       OFF     2   3.55    674     32.4

I am able to do both tasks separately to find mean and mode for particular columns, but I can't work out how to join the two outputs together, but then I was also wondering if there is a way to calculate mean or mode of selected columns in one hit?

In my current method, although I am able to do both tasks separetely I am struggling to join the two outputs together.

Here's my separate methods

#as mode is not an official function in Pandas, set out method here
from scipy.stats.mstats import mode
f = lambda x: mode(x, axis=None)[0]

means = MY_DF.groupby(['Feature','ID'])[ITEM1,ITEM3].mean()
moder = MY_DF.groupby(['Feature','ID'])[ITEM2].apply(f)

My mean method outputs something like this:

        ITEM1   ITEM3
ON  1   1.93    37.8
    2   8.32    15.5
OFF 1   2.55    23.4
    2   3.55    32.4

My mode method outputs something like this:

        ITEM2   
ON  1   [544]   
    2   [674]   
OFF 1   [544]
    2   [674]

Im looking for a final output something like this if possible?

        ITEM1   ITEM3   ITEM2
ON  1   1.93    37.8    544
    2   8.32    15.5    674 
OFF 1   2.55    23.4    544
    2   3.55    32.4    674

unutbu · Accepted Answer · 2015-08-13T11:36:43.373

2

You can pass a dict to GroupBy.agg. The keys refer to columns of the DataFrame and the value specifies the aggregation function. For example,

import pandas as pd
import scipy.stats.mstats as mstats
def mode(x):
    return mstats.mode(x, axis=None)[0]

df = pd.DataFrame({'Feature': ['ON', 'OFF', 'ON', 'ON', 'OFF'],
                   'ID': [1, 1, 2, 1, 2],
                   'ITEM1': [2.3, 2.55, 8.32, 1.56, 3.55],
                   'ITEM2': [544, 544, 674, 544, 674],
                   'ITEM3': [32.2, 23.4, 15.5, 43.4, 32.4],})

df.groupby(['Feature','ID']).agg({'ITEM1':'mean','ITEM3':'mean','ITEM2':mode})
print(df)

yields

  Feature  ID  ITEM1  ITEM2  ITEM3
0      ON   1   2.30    544   32.2
1     OFF   1   2.55    544   23.4
2      ON   2   8.32    674   15.5
3      ON   1   1.56    544   43.4
4     OFF   2   3.55    674   32.4

edited Aug 13 '15 at 11:36

answered Aug 13 '15 at 10:35

unutbu

842,883
184
1,785
1,677

Fantastic, thanks unutbu, this works perfectly. One other question I had was how would I ensure any mean values in the output are limited to 2 decimal places? – MikG Aug 13 '15 at 10:48
You could use `pd.options.display.float_format = '{:.2f}'.format` to control how floats are *displayed* when you print DataFrames. If you need to control the display for only certain columns, see http://stackoverflow.com/a/23922119/190597 – unutbu Aug 13 '15 at 11:39

Pandas mean and mode particular columns in DataFrame

1 Answers1