Get frequency of item occurrences in a column as percentage

Question

I want to get a percentage of a particular value in a df column. Say I have a df with (col1, col2 , col3, gender) gender column has values of M, F, or Other. I want to get the percentage of M, F, Other values in the df.

I have tried this, which gives me the number M, F, Other instances, but I want these as a percentage of the total number of values in the df.

df.groupby('gender').size()

Can someone help?

cs95 · Accepted Answer · 2021-02-14T21:47:46.273

149

Use value_counts with normalize=True:

df['gender'].value_counts(normalize=True) * 100

The result is a fraction in range (0, 1]. We multiply by 100 here in order to get the %.

edited Feb 14 '21 at 21:47

answered May 28 '18 at 03:04

cs95

379,657
97
704
746

score 11 · Answer 2 · edited Nov 30 '18 at 09:12

If you do not need to look M and F values other than gender column then, may be you can try using value_counts() and count() as following:

df = pd.DataFrame({'gender':['M','M','F', 'F', 'F']})
# Percentage calculation
(df['gender'].value_counts()/df['gender'].count())*100

Result:

F    60.0
M    40.0
Name: gender, dtype: float64

Or, using groupby:

(df.groupby('gender').size()/df['gender'].count())*100

score 4 · Answer 3 · edited Sep 21 '19 at 10:57

4

Let's say there are 200 values out of which 120 are categorized as M and 80 as F

1)

df['gender'].value_counts()

 output:

 M=120
 F=80

2)

df['gender'].value_counts(Normalize=True)

  output:

  M=0.60
  F=0.40

3)

df['gender'].value_counts(Normalize=True)*100 #will convert output to percentages

  output:

  M=60
  F=40

edited Sep 21 '19 at 10:57

Mortz

4,654
1
19
35

answered Sep 21 '19 at 10:49

Rohith Gunda

41
3

1

its `normalize=True` instead of `Normalize` docs: https://pandas.pydata.org/docs/reference/api/pandas.Series.value_counts.html – Akash Ranjan Mar 09 '22 at 05:13

score 0 · Answer 4 · edited May 07 '19 at 09:46

finding the percentage of target variation to chenck imbalance/not.

g = data[Target_col_Y]
df = pd.concat([g.value_counts(),              
g.value_counts(normalize=True).mul(100)],axis=1,keys=('counts','percentage'))

print (df)

counts percentage

0 36548 88.734583

1 4640 11.265417

finding the maximum in the columns percentage here, to check how much #imbalance there

df1=df.diff(periods=1,axis=0)
difvalue=df1[[list(df1.columns)[-1]]].max()

score 0 · Answer 5 · answered Feb 07 '20 at 06:06

0

print('(Gender Male= 0):\n {}%'.format(100 - round(df['Gender'].mean()*100, 2)))
print('(Gender Female=1):\n{}%'.format(round(df['Gender'].mean()*100, 2)))

answered Feb 07 '20 at 06:06

Harshal SG

403
3
7

Get frequency of item occurrences in a column as percentage

5 Answers5

finding the percentage of target variation to chenck imbalance/not.

finding the maximum in the columns percentage here, to check how much #imbalance there

Linked

Related