1

Edit: I tried a standalone Spark application (instead of PredictionIO) and my observations are the same. So this is not a PredictionIO issue, but still confusing.


I am using PredictionIO 0.9.6 and the Recommendation template for collaborative filtering. The ratings in my data set are numbers between 1 and 10. When I first trained a model with defaults from the template (using ALS.train), the predictions were horrible, at least subjectively. Scores ranged up to 60.0 or so but the recommendations seemed totally random.

Somebody suggested that ALS.trainImplicit did a better job, so I changed src/main/scala/ALSAlgorithm.scala accordingly:

val m = ALS.trainImplicit(  // instead of ALS.train
  ratings = mllibRatings,
  rank = ap.rank,
  iterations = ap.numIterations,
  lambda = ap.lambda,
  blocks = -1,
  alpha = 1.0,  // also added this line
  seed = seed)

Scores are much lower now (below 1.0) but the recommendations are in line with the personal ratings. Much better, but also confusing. PredictionIO defines the difference between explicit and implicit this way:

explicit preference (also referred as "explicit feedback"), such as "rating" given to item by users. implicit preference (also referred as "implicit feedback"), such as "view" and "buy" history.

and:

By default, the recommendation template uses ALS.train() which expects explicit rating values which the user has rated the item.

source

Is the documentation wrong? I still think that explicit feedback fits my use case. Maybe I need to adapt the template with ALS.train in order to get useful recommendations? Or did I just misunderstand something?

stholzm
  • 3,395
  • 19
  • 31
  • Where do you your ratings come from? Are they calculated or do you explicitly ask users to rate items between 1 and 10? If so, then you are indeed using explicit feedback – alex9311 Jun 27 '16 at 12:06
  • @alex9311 indeed, the users rated items on a 1 to 10 scale. I have a few million ratings. – stholzm Jun 27 '16 at 20:30

2 Answers2

1

A lot of it depends on how you gathered the data. Often ratings that seem explicit can actually be implicit. For instance, assume you give the option of allowing users to rate items that they have purchased / used before. This means that the very fact that they have spent their time evaluating that particular item means that the item is of a high quality. As such, items of poor quality are not rated at all because people do not even bother to use them. As such, even though the dataset is intended to be explicit, you may get better results because if you consider the results to be implicit. Again, this varies significantly based on how the data is obtained.

Skylion
  • 2,696
  • 26
  • 50
0

The explict data (like ratings) normally comes with bias - people go and rate a product because they like it! Think about your experience shopping and then rating on Amazon.com :-)

On the contrary, implict info often can truly reflect user's favor on a product, like viewing duration, comment length, etc. Even a like/dislike is better that rating because it provides a very simple 'bad' option without bothering a user to think "if I should give a 3, 3.5, or 4?".

Z.Wei
  • 3,658
  • 2
  • 17
  • 28