2

Am using spark MLlib to train an ALS model with implicit rating, the data I have are like this ... user_id, item_id, number_of_purchase, after reading about ALS implicit training, it seems that the resulting matrix is a preference matrix with values near (0 -> 1), my question is how to evaluate it against test data, as the test matrix has number_of_purchase so if I get it right, RMSE can't be used

# Load and parse the data
data = sc.textFile("dataset_60k.txt")
training, test = data.randomSplit([0.8, 0.2])

train_ratings = training.map(lambda l: l.split(',')).map(lambda l: Rating(int(l[0]), int(l[1]), float(l[2])))
test_ratings = test.map(lambda l: l.split(',')).map(lambda l: Rating(int(l[0]), int(l[1]), float(l[2])))

# Build the recommendation model using Alternating Least Squares
rank = 10
numIterations = 10

model = ALS.trainImplicit(train_ratings, rank=rank, iterations=numIterations)

# Evaluate the model on training data
testdata = test_ratings.map(lambda p: (p[0], p[1]))

#this is a prefrence matrix
predictions = model.predictAll(testdata).map(lambda r: ((r[0], r[1]), r[2])) 

ratesAndPreds = test_ratings.map(lambda r: ((r[0], r[1]), r[2])).join(predictions)
MSE = ratesAndPreds.map(lambda r: (r[1][0] - r[1][1]) ** 2).mean()
print("Mean Squared Error = " + str(MSE))
Exorcismus
  • 2,243
  • 1
  • 35
  • 68

0 Answers0