Compute mean if two conditions are met

Question

Set-up

I am scraping housing ads using Scrapy and subsequently analyse the data with pandas.

I use the pandas to compute the means and medians of several housing characteristics.

The dataframe df looks like,

district | rent | rooms | …
----------------------------
 North   | 200  |   3   | …
 South   | 300  |   1   | …
 South   | 300  |   1   | …
   ⋮         ⋮       ⋮     ⋮

Problem

I would like to compute the average rent for a n-room apartment per district.

I found an answer here which brings me close, e.g.

df.loc[df['rooms'] == 1, 'rent'].mean()

but this computes the average rent for one-bedroom apartments for the whole city.

To do it per district, I'd like to do something like,

for d in district_set:
     df.loc[df['rooms'] == 1 and df['district'] == d, 'rent'].mean()

where district_set contains all possible districts.

Any suggestions?

I'd like to obtain the following table,

district | avg rent 1R | avg rent 2R | …
----------------------------------------
 North   |     200     |     400     | …
 South   |     300     |     500     | …
   ⋮            ⋮              ⋮

Use groupby and aggregate mean – bigbounty May 08 '17 at 11:29 — bigbounty, May 08 '17 at 11:29

Martin Valgur · Accepted Answer · 2017-05-08T12:59:16.260

1

df.groupby(['district', 'rooms'])['rent'].mean().unstack() should work. unstack() turns the MultiIndex returned by the previous expression to a table with district as rows and rooms as the columns.

edited May 08 '17 at 12:59

answered May 08 '17 at 11:48

Martin Valgur

5,793
1
33
45

Thank you for the answer Martin. However, this sorts the table in a different way than I want (see question). – LucSpan May 08 '17 at 12:55
Ah, I missed that part. Simply add `unstack()` to convert the MultiIndex to columns. – Martin Valgur May 08 '17 at 13:00

score 0 · Answer 2 · answered May 08 '17 at 11:48

0

You can collapse the dataframe by grouping by district and the number of rooms, then aggregating using the mean as @bigbounty recommended.

df.groupby(['rooms', 'district'])['rent'].mean()

answered May 08 '17 at 11:48

James

32,991
4
47
70

Thank you for the answer James. However, this sorts the table in a different way than I want (see question). – LucSpan May 08 '17 at 12:55

Compute mean if two conditions are met

2 Answers2