Edit: I tried a standalone Spark application (instead of PredictionIO) and my observations are the same. So this is not a PredictionIO issue, but still confusing.
I am using PredictionIO 0.9.6 and the Recommendation template for collaborative filtering. The ratings in my data set are numbers between 1 and 10. When I first trained a model with defaults from the template (using ALS.train
), the predictions were horrible, at least subjectively. Scores ranged up to 60.0 or so but the recommendations seemed totally random.
Somebody suggested that ALS.trainImplicit
did a better job, so I changed src/main/scala/ALSAlgorithm.scala
accordingly:
val m = ALS.trainImplicit( // instead of ALS.train
ratings = mllibRatings,
rank = ap.rank,
iterations = ap.numIterations,
lambda = ap.lambda,
blocks = -1,
alpha = 1.0, // also added this line
seed = seed)
Scores are much lower now (below 1.0) but the recommendations are in line with the personal ratings. Much better, but also confusing. PredictionIO defines the difference between explicit and implicit this way:
explicit preference (also referred as "explicit feedback"), such as "rating" given to item by users. implicit preference (also referred as "implicit feedback"), such as "view" and "buy" history.
and:
By default, the recommendation template uses
ALS.train()
which expects explicit rating values which the user has rated the item.
Is the documentation wrong? I still think that explicit feedback fits my use case. Maybe I need to adapt the template with ALS.train
in order to get useful recommendations? Or did I just misunderstand something?