0

I have a Dataframe as the following:

import pandas as pd

data = {'gender': ['female', 'female', 'female', 'male', 'male'], 'race': ['group B', 'group C', 'group B', 'group A', 'group C'], 'parental level of education': ["bachelor's degree", 'some college', "master's degree", "associate's degree", 'some college'], 'lunch': ['standard', 'standard', 'standard', 'free/reduced', 'standard']}
df = pd.DataFrame(data)

# display(df)
   gender     race parental level of education         lunch
0  female  group B           bachelor's degree      standard
1  female  group C                some college      standard
2  female  group B             master's degree      standard
3    male  group A          associate's degree  free/reduced
4    male  group C                some college      standard

Q: I want to find a way to count how many female in each group (race column) separately.

I used 'groupby('gender').count()', but it counts how many male and female in the entire data

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Omar Hossam
  • 311
  • 1
  • 2
  • 9

2 Answers2

0

You can group by multiple columns, in this case, gender and race.

import pandas as pd
df = pd.DataFrame({'gender' : ['female', 'male', 'female', 'female', 'male'], 
                   'race' : ['group B', 'group C', 'group A', 'group A', 'group B'], 
                   'education': ['bachelor's degree', ... , 'associate's degree']})

group = df.groupby(['gender', 'race']).count()

# print(group)
''' 
                 education
gender  race    
female  group A      2
        group B      1
male    group B      1
        group C      1
'''

Then you can access the count for each gender and race group.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Colin
  • 51
  • 5
0

You can use groupby hierarchically, and then count on an arbitrary column (here, I took 'lunch'):

df.groupby(['gender', 'race']).count()['lunch']['female']

With your example:

import pandas as pd

data = {'gender': ['female', 'female', 'female', 'male', 'male'], 'race': ['group B', 'group C', 'group B', 'group A', 'group C'], 'parental level of education': ["bachelor's degree", 'some college', "master's degree", "associate's degree", 'some college'], 'lunch': ['standard', 'standard', 'standard', 'free/reduced', 'standard']}
df = pd.DataFrame(data)

df.groupby(['gender', 'race']).count()['lunch']['female'].rename('nb female')

Out[1]: 
race
group B    2
group C    1
Name: nb female, dtype: int64
apaolillo
  • 135
  • 2
  • 5