Alternating Least Square error in pyspark

Question

I've been trying to train model based on ALS using pyspark.ALS.recommendation. Code :

from pyspark.ALS.recommendation import ALS 
model=ALS.train(trainingset,rank=8,seed=0,iterations=10,lambda_=0.1)

But I am getting this following error :

invalid literal for int() with base 10: 'userId'

check the format of your trainingset. – jtitusj Aug 02 '16 at 08:30 — jtitusj, Aug 02 '16 at 08:30
Including a [mcve] could help to solve this problem. – J.J. Hakala Aug 02 '16 at 13:13 — J.J. Hakala, Aug 02 '16 at 13:13

score 0 · Accepted Answer · edited May 23 '17 at 11:51

0

Well, the error message means you are passing some 'userId' text where a number is expected. Without further information (like full error message or the stacktrace) is hard to say what exactly the problem is.

EDIT: As mentioned in the comments, it turns out you have the 'header' row from CSV as your first row of 'trainingset' data. And that is the reason for the problem. You simply need to make sure the header row is skipped - e.g. by following How do I skip a header from CSV files in Spark?

edited May 23 '17 at 11:51

Community

1
1

answered Aug 02 '16 at 08:27

Grzegorz Oledzki

23,614
16
68
106

These are first 10 rows of the 'trainingset' dataset : [('userId', 'movieId', 'rating'), ('1', '16', '4.0'), ('1', '24', '1.5'), ('1', '32', '4.0'), ('1', '47', '4.0'), ('1', '50', '4.0'), ('1', '150', '3.0'), ('1', '165', '3.0'), ('1', '260', '4.5'), ('1', '261', '1.5')] – Ishan Aug 02 '16 at 09:07
The "header" row shouldn't be part of the data, should it? I guess you are taking that from some kind of CSV file or something. – Grzegorz Oledzki Aug 02 '16 at 10:30
Oh yes yes! I didn't think about it. My mistake.And yes, I am taking data from csv file. But whenever I import data from a file, it is showing the headers in its first row. Can you suggest something for it? – Ishan Aug 02 '16 at 11:46
Please check http://stackoverflow.com/questions/27854919/how-to-skip-header-from-csv-files-in-spark – Grzegorz Oledzki Aug 02 '16 at 13:54

Alternating Least Square error in pyspark

1 Answers1