How to get the frequency of occurrence of a column as a percent

Question

I have a dataset, df, that looks similar to this:

houses    price
ranch     300,000
ranch     350,000
ranch     400,000
condo     250,000
condo     275,000
townhome  300,000

I would like to groupby the different categories within the 'houses' column and display the percentage of each category

Desired output

houses      percent
ranch       50%
condo       33%
townhome    16.60%

This is what I am doing:

percent is part/whole


df1 = df.groupby['houses'].sum()    #df1 gives us the sum
percent = df1.['houses']/df1

However, I am not retaining both columns houses and percent Any suggestion is appreciated

You need to change the way the data is being read. `'price'` is not a number. If you're reading from a csv with `.read_csv()`, then use the `thousands=','` parameter. — Trenton McKinney, Dec 17 '20 at 19:26

Kyle · Accepted Answer · 2020-12-17T19:29:04.730

4

You can count the unique values with value_counts and use the normalize parameter:

df['houses'].value_counts(normalize=True) * 100

ranch       50.000000
condo       33.333333
townhome    16.666667
Name: houses, dtype: float64

Edit: to convert to a DataFrame:

(df['houses'].value_counts(normalize=True) * 100).to_frame()

edited Dec 17 '20 at 19:29

answered Dec 17 '20 at 19:11

Kyle

score 2 · Answer 2 · answered Dec 17 '20 at 19:16

2

Groupby version:

>>> df.groupby('houses').count() / len(df) * 100

              price
houses             
condo     33.333333
ranch     50.000000
townhome  16.666667

answered Dec 17 '20 at 19:16

Tom

2 Answers2