2

I have a dataset, df, that looks similar to this:

houses    price
ranch     300,000
ranch     350,000
ranch     400,000
condo     250,000
condo     275,000
townhome  300,000

I would like to groupby the different categories within the 'houses' column and display the percentage of each category

Desired output

houses      percent
ranch       50%
condo       33%
townhome    16.60%

This is what I am doing:

percent is part/whole


df1 = df.groupby['houses'].sum()    #df1 gives us the sum
percent = df1.['houses']/df1

However, I am not retaining both columns houses and percent Any suggestion is appreciated

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Lynn
  • 4,292
  • 5
  • 21
  • 44

2 Answers2

4

You can count the unique values with value_counts and use the normalize parameter:

df['houses'].value_counts(normalize=True) * 100

ranch       50.000000
condo       33.333333
townhome    16.666667
Name: houses, dtype: float64

Edit: to convert to a DataFrame:

(df['houses'].value_counts(normalize=True) * 100).to_frame()
Kyle
  • 2,814
  • 2
  • 17
  • 30
2

Groupby version:

>>> df.groupby('houses').count() / len(df) * 100

              price
houses             
condo     33.333333
ranch     50.000000
townhome  16.666667
Tom
  • 8,310
  • 2
  • 16
  • 36