1

I've been trying to train model based on ALS using pyspark.ALS.recommendation. Code :

from pyspark.ALS.recommendation import ALS 
model=ALS.train(trainingset,rank=8,seed=0,iterations=10,lambda_=0.1)

But I am getting this following error :

invalid literal for int() with base 10: 'userId'
Grzegorz Oledzki
  • 23,614
  • 16
  • 68
  • 106
Ishan
  • 996
  • 3
  • 13
  • 34

1 Answers1

0

Well, the error message means you are passing some 'userId' text where a number is expected. Without further information (like full error message or the stacktrace) is hard to say what exactly the problem is.

EDIT: As mentioned in the comments, it turns out you have the 'header' row from CSV as your first row of 'trainingset' data. And that is the reason for the problem. You simply need to make sure the header row is skipped - e.g. by following How do I skip a header from CSV files in Spark?

Community
  • 1
  • 1
Grzegorz Oledzki
  • 23,614
  • 16
  • 68
  • 106
  • These are first 10 rows of the 'trainingset' dataset : [('userId', 'movieId', 'rating'), ('1', '16', '4.0'), ('1', '24', '1.5'), ('1', '32', '4.0'), ('1', '47', '4.0'), ('1', '50', '4.0'), ('1', '150', '3.0'), ('1', '165', '3.0'), ('1', '260', '4.5'), ('1', '261', '1.5')] – Ishan Aug 02 '16 at 09:07
  • The "header" row shouldn't be part of the data, should it? I guess you are taking that from some kind of CSV file or something. – Grzegorz Oledzki Aug 02 '16 at 10:30
  • Oh yes yes! I didn't think about it. My mistake.And yes, I am taking data from csv file. But whenever I import data from a file, it is showing the headers in its first row. Can you suggest something for it? – Ishan Aug 02 '16 at 11:46
  • Please check http://stackoverflow.com/questions/27854919/how-to-skip-header-from-csv-files-in-spark – Grzegorz Oledzki Aug 02 '16 at 13:54