How to count sub in other subs

Question

I have a Dataframe as the following:

import pandas as pd

data = {'gender': ['female', 'female', 'female', 'male', 'male'], 'race': ['group B', 'group C', 'group B', 'group A', 'group C'], 'parental level of education': ["bachelor's degree", 'some college', "master's degree", "associate's degree", 'some college'], 'lunch': ['standard', 'standard', 'standard', 'free/reduced', 'standard']}
df = pd.DataFrame(data)

# display(df)
   gender     race parental level of education         lunch
0  female  group B           bachelor's degree      standard
1  female  group C                some college      standard
2  female  group B             master's degree      standard
3    male  group A          associate's degree  free/reduced
4    male  group C                some college      standard

Q: I want to find a way to count how many female in each group (race column) separately.

I used 'groupby('gender').count()', but it counts how many male and female in the entire data

`pd.crosstab(df['race'],df['gender'])` – Quang Hoang Jan 20 '21 at 21:43 — Quang Hoang, Jan 20 '21 at 21:43

score 0 · Answer 1 · edited Jan 20 '21 at 22:00

You can group by multiple columns, in this case, gender and race.

import pandas as pd
df = pd.DataFrame({'gender' : ['female', 'male', 'female', 'female', 'male'], 
                   'race' : ['group B', 'group C', 'group A', 'group A', 'group B'], 
                   'education': ['bachelor's degree', ... , 'associate's degree']})

group = df.groupby(['gender', 'race']).count()

# print(group)
''' 
                 education
gender  race    
female  group A      2
        group B      1
male    group B      1
        group C      1
'''

Then you can access the count for each gender and race group.

score 0 · Answer 2 · answered Jan 20 '21 at 22:23

You can use groupby hierarchically, and then count on an arbitrary column (here, I took 'lunch'):

df.groupby(['gender', 'race']).count()['lunch']['female']

With your example:

import pandas as pd

data = {'gender': ['female', 'female', 'female', 'male', 'male'], 'race': ['group B', 'group C', 'group B', 'group A', 'group C'], 'parental level of education': ["bachelor's degree", 'some college', "master's degree", "associate's degree", 'some college'], 'lunch': ['standard', 'standard', 'standard', 'free/reduced', 'standard']}
df = pd.DataFrame(data)

df.groupby(['gender', 'race']).count()['lunch']['female'].rename('nb female')

Out[1]: 
race
group B    2
group C    1
Name: nb female, dtype: int64

How to count sub in other subs

2 Answers2