group by with mode as aggregator

Question

I've got a set of survey responses that I'm trying to analyze with pandas. My goal is to find (for this example) the most common gender in each county in the US, so I use the following code:

import pandas as pd
from scipy import stats
file['sex'].groupby(file['county']).agg([('modeSex', stats.mode)])

The output is:

How can I unpack this to only get the mode value and not the second value that tells how often the mode occurs?

Here is a sample of the data frame:

county|sex
----------
079   | 1
----------
079   | 2
----------
079   | 2
----------
075   | 1
----------
075   | 1
----------
075   | 1
----------
075   | 2

Desired output is:

county|modeSex
----------
079   | 2
----------
075   | 1

ayhan · Accepted Answer · 2016-04-08T21:33:56.787

Pandas is complaining about the returning array (I guess a pandas cell cannot hold a numpy array) when you use stats.mode(x)[0] so you can convert it to a list or a tuple:

df = pd.DataFrame({"C1": np.random.randint(10, size=100), "C2": np.random.choice(["X", "Y", "Z"], size=100)})
print(df.groupby(['C2']).agg(lambda x: tuple(stats.mode(x)[0])))

Out:

     C1
C2      
X   (0,)
Y   (4,)
Z   (3,)

Since there can be multiple modes, if you want to keep all of them you'll need tuples or lists. If you want the first mode, you can extract that:

df.groupby(['C2']).agg(lambda x: stats.mode(x)[0][0])

Out:

    C1
C2    
X    0
Y    4
Z    3

`file['sex'].groupby(file['county']).agg({'modeSex': lambda x: stats.mode(x)[0][0]})` ended up being the winner... thanks! — Josh, Apr 08 '16 at 21:25

sid · Answer 2 · 2016-04-08T20:51:32.250

1

scipy.stats.mode returns array of modal values, array of counts for each mode so we can use stats.mode(a)[0] to return only first value

here is the code

import pandas as pd
from scipy import stats
# sample data frame
df2 = pd.DataFrame({'X' : ['B', 'B', 'A', 'A'], 'Y' : [1, 2, 3, 4]})
# use lambda functions
print df2.groupby(['X']).agg({'Y': lambda x:stats.mode(x)[0]})

output:

edited Apr 08 '16 at 20:51

answered Apr 08 '16 at 20:45

sid

97
12

Makes sense conceptually, but got this error: Exception: Must produce aggregated value – Josh Apr 08 '16 at 20:58
can you post the code along with a sample dataframe – sid Apr 08 '16 at 21:07
well as per your sample dataset code is running fine on my side – sid Apr 08 '16 at 21:26
It's probably a version issue. I get the same error with pandas 0.18.0. – ayhan Apr 08 '16 at 21:34
yes may be i am running python 2.7.11 and pandas 0.17 – sid Apr 08 '16 at 21:45

group by with mode as aggregator

2 Answers2