Filtering and displaying values in GraphLab Sframe?

Question

So, I started working with Graphlab for my Machine learning class a week ago. I am still very new to Graphlab and i read through the API but couldn't quite get the solution I was looking for. So, here is the question. I have this data with multiple columns e.g- bedrooms,bathrooms,square ft,zipcode etc. These are basically the features and my goal is to work with various ML algorithms to predict the price of a house. Now, I am supposed to find the average price of the houses with zipcode - 93038. So, i broke down the problem into smaller bits as i am quite naive and decided to use my instincts. This is what i tried so far.Firstly, I tried to find a way to create a filter such that i can extract only the prices of the house with zipcode - 93038.

import graphlab
sf = graphlab.SFrame('home_data.gl')
sf[(sf['zipcode']=='93038')]

These showed me all the columns with zipcode 93038 but then i only want to display the price and zipcode column with value 93038. I tried so many different ways but just couldn't figure things out.

Also, lets say i want to find the mean of the prices with zipcode value 93038.How do i do that?

Thanks in advance.

Adrien Renaud · Accepted Answer · 2016-06-24T18:02:29.753

6

You could try:

import graphlab as gl
sf = gl.SFrame({'price':[1,4,2],'zipcode':['93038','93038','93037']})

# Filtering
filter_sf = sf[(sf['zipcode']=='93038')] 

# Displaying
print filter_sf[['price', 'zipcode']]

# Averaging a column
print filter_sf['price'].mean()

edited Jun 24 '16 at 18:02

answered Jun 24 '16 at 09:08

Adrien Renaud

2,439
18
22

:- Thanks. I tried the above solution but it gives the output as none. What could be the problem? – Lesley Jun 24 '16 at 11:34
It should work but 'price' need to be a numerical feature to apply mean(). Which output is None ? – Adrien Renaud Jun 24 '16 at 13:13
:- print filter_sf['price'].mean() outputs none. – Lesley Jun 24 '16 at 13:40
Bdw, my sf is different than the one you used. I used sf = graphlab.SFrame('home_data.gl'). Could this be the reason of the error? – Lesley Jun 24 '16 at 13:43
could you show the result of `print filter_sf[['price', 'zipcode']]` and `print filter_sf['price'].sketch_summary()` ? – Adrien Renaud Jun 24 '16 at 13:50
:- Actually, I wrote the same program on a fresh IpythonNotebook and it worked now. I don't know what caused my previous program to crash like that but I really appreciate that you helped me to sort this bug. Thank You very very much. – Lesley Jun 24 '16 at 14:03

naman1994 · Answer 2 · 2019-01-15T11:08:19.873

1

Use GroupBy operation and topk() function

import graphlab.aggregate as agg
sf_ = sf.groupby(key_columns = 'zipcode', operations={'Mean by ZipCode' : agg.MEAN('price')})
sf_.topk('Mean by ZipCode', k=1)

edited Jan 15 '19 at 11:08

answered Jan 14 '19 at 05:37

naman1994

315
5
11

score 0 · Answer 3 · edited Dec 18 '16 at 03:53

0

mean_by_zip = sales.groupby(key_columns=['zipcode'], 
       operations={'avg': graphlab.aggregate.MEAN('price')})

mean_by_zip.sort('avg', ascending=False)[0:3] # will give top 3

edited Dec 18 '16 at 03:53

thor

21,418
31
87
173

answered Dec 18 '16 at 03:33

Gini123

1

score 0 · Answer 4 · edited May 11 '17 at 07:33

0

Here is what I did:

- 1st option

sf[sf['zipcode']=='98039']['price'].mean()

- 2nd option

zip = ['98039'] *#create your variable with the zipcode you want*

m_price = sf.filter_by(zip, 'zipcode') *#you filter the column 'zipcode' by your zipcode*

print m_price['price'].mean() *#print the mean of the zipcode*

edited May 11 '17 at 07:33

cosmoonot

2,161
3
32
38

answered May 10 '17 at 14:42

J.Carlos

1
1

Filtering and displaying values in GraphLab Sframe?

4 Answers4