0

Set-up

I am scraping housing ads using Scrapy and subsequently analyse the data with pandas.

I use the pandas to compute the means and medians of several housing characteristics.

The dataframe df looks like,

district | rent | rooms | …
----------------------------
 North   | 200  |   3   | …
 South   | 300  |   1   | …
 South   | 300  |   1   | …
   ⋮         ⋮       ⋮     ⋮

Problem

I would like to compute the average rent for a n-room apartment per district.

I found an answer here which brings me close, e.g.

df.loc[df['rooms'] == 1, 'rent'].mean()

but this computes the average rent for one-bedroom apartments for the whole city.

To do it per district, I'd like to do something like,

for d in district_set:
     df.loc[df['rooms'] == 1 and df['district'] == d, 'rent'].mean()

where district_set contains all possible districts.

Any suggestions?

I'd like to obtain the following table,

district | avg rent 1R | avg rent 2R | …
----------------------------------------
 North   |     200     |     400     | …
 South   |     300     |     500     | …
   ⋮            ⋮              ⋮
Community
  • 1
  • 1
LucSpan
  • 1,831
  • 6
  • 31
  • 66

2 Answers2

1

df.groupby(['district', 'rooms'])['rent'].mean().unstack() should work. unstack() turns the MultiIndex returned by the previous expression to a table with district as rows and rooms as the columns.

Martin Valgur
  • 5,793
  • 1
  • 33
  • 45
0

You can collapse the dataframe by grouping by district and the number of rooms, then aggregating using the mean as @bigbounty recommended.

df.groupby(['rooms', 'district'])['rent'].mean()
James
  • 32,991
  • 4
  • 47
  • 70
  • Thank you for the answer James. However, this sorts the table in a different way than I want (see question). – LucSpan May 08 '17 at 12:55