4

I am currently doing a course i Machine Learning from Coursera ( https://www.coursera.org/learn/ml-foundations/lecture/6wD6H/visualizing-predictions-of-simple-model-with-matplotlib ). The course use Graphlab Create framework for during the course for learning and assignments. I don't want to use Graphlab, instead I am using pandas, numpy for assignments.

In the course, the instructor has created a regression model, and then he shows the prediction using matplotlib:

Build Regression Model

sqft_model = graphlab.linear_regression.create(train_data, target='price', features=['sqft_living'],validation_set=None)

and then the prediction code is as follows:

plt.plot(test_data['sqft_living'],test_data['price'],'.',
        test_data['sqft_living'],sqft_model.predict(test_data),'-')

The result is:

prediction image

In the above image, blue dots are test data, green line is the prediction from the simple regression. I am a complete beginner to programming and python. I wanted to use free resources such as pandas and scikit. I have used following to do the same in Ipython:

Build Regression Model

from pandas.stats.api import ols
sqft_model = ols(y=train_data['price'], x=train_data['sqft_living'])

But, I get the following error while inputting the prediction code:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

Thus, I am not able to produce the desired result as done by the instructor (i.e. the image shown above). Can anyone help me out?

pls find the below link to download data:

https://onedrive.live.com/redir?resid=EDAAD532F68FDF49!1091&authkey=!AKs341lbRnuCt9w&ithint=folder%2cipynb

Community
  • 1
  • 1
Ajeet
  • 57
  • 8
  • Hi @Drjnker... I get the following error while inputting the prediction code: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). – Ajeet Dec 14 '15 at 12:24
  • May the problem come from your `train_data` ? – MaTh Dec 14 '15 at 12:35
  • Thanks @Drjnker..I will check> Meanwhile, could you pls let me know what is the meaning of : The truth value of a Series is ambiguous...? – Ajeet Dec 14 '15 at 12:49
  • You must be looking for a condition on a serie but python doesn't know if you want "the series isn't empty", "an element of the serie is true", or "all elements of the serie are true".. that's why it gives you `a.empty`, `a.all()` ... Although it doesnt make much sense in your code (from where I sit) – MaTh Dec 14 '15 at 13:19

1 Answers1

1

I suspect the issue here is that the Pandas OLS model can't understand GraphLab's SArray. Try converting the SFrames train_data and test_data into a Pandas Dataframe first - the following works for me:

df_train = train_data.to_dataframe()
model = old(y=df_train['price'], x=df_train['sqft_living'])
papayawarrior
  • 1,027
  • 7
  • 10
  • Hi, Thanks for the help... when I input the above command, i m getting: AttributeError: 'DataFrame' object has no attribute 'to_dataframe'.. AM I missing something... ..Thanks ! – Ajeet Dec 15 '15 at 06:01
  • [This might help](http://stackoverflow.com/questions/20763012/creating-a-pandas-dataframe-from-a-numpy-array-how-do-i-specify-the-index-colum) but I don't know what type is your data. – MaTh Dec 15 '15 at 08:46
  • @Drjnker..I have provided Onedrive link ( Edited above) to download data and code files i am using,.. could you pls check.. – Ajeet Dec 15 '15 at 09:42