3

I've got a set of survey responses that I'm trying to analyze with pandas. My goal is to find (for this example) the most common gender in each county in the US, so I use the following code:

import pandas as pd
from scipy import stats
file['sex'].groupby(file['county']).agg([('modeSex', stats.mode)])

The output is:

enter image description here

How can I unpack this to only get the mode value and not the second value that tells how often the mode occurs?

Here is a sample of the data frame:

county|sex
----------
079   | 1
----------
079   | 2
----------
079   | 2
----------
075   | 1
----------
075   | 1
----------
075   | 1
----------
075   | 2

Desired output is:

county|modeSex
----------
079   | 2
----------
075   | 1
Josh
  • 1,237
  • 4
  • 15
  • 22

2 Answers2

3

Pandas is complaining about the returning array (I guess a pandas cell cannot hold a numpy array) when you use stats.mode(x)[0] so you can convert it to a list or a tuple:

df = pd.DataFrame({"C1": np.random.randint(10, size=100), "C2": np.random.choice(["X", "Y", "Z"], size=100)})
print(df.groupby(['C2']).agg(lambda x: tuple(stats.mode(x)[0])))

Out:

     C1
C2      
X   (0,)
Y   (4,)
Z   (3,)

Since there can be multiple modes, if you want to keep all of them you'll need tuples or lists. If you want the first mode, you can extract that:

df.groupby(['C2']).agg(lambda x: stats.mode(x)[0][0])

Out:

    C1
C2    
X    0
Y    4
Z    3
ayhan
  • 70,170
  • 20
  • 182
  • 203
  • `file['sex'].groupby(file['county']).agg({'modeSex': lambda x: stats.mode(x)[0][0]})` ended up being the winner... thanks! – Josh Apr 08 '16 at 21:25
1

scipy.stats.mode returns array of modal values, array of counts for each mode so we can use stats.mode(a)[0] to return only first value

here is the code

import pandas as pd
from scipy import stats
# sample data frame
df2 = pd.DataFrame({'X' : ['B', 'B', 'A', 'A'], 'Y' : [1, 2, 3, 4]})
# use lambda functions
print df2.groupby(['X']).agg({'Y': lambda x:stats.mode(x)[0]})

output:

    y   
X   
A  3
B  1
sid
  • 97
  • 12