2

I'm using ALS algorithm (implicitPrefs = True)in Spark (Recommendation system algorithm). Normally, after run this algorithm, value predict must be from 0 to 1. But i received value greater than 1

    "usn" : 72164,
    "recommendations" : [ 
        {
            "item_code" : "C1346",
            "rating" : 0.756096363067627
        }, 
        {
            "item_code" : "C0117",
            "rating" : 0.966064214706421
        }, 
        {
            "item_code" : "I0009",
            "rating" : 1.00000607967377
        }, 
        {
            "item_code" : "C0102",
            "rating" : 0.974934458732605
        }, 
        {
            "item_code" : "I0853",
            "rating" : 1.03272235393524
        }, 
        {
            "item_code" : "C0103",
            "rating" : 0.928574025630951
        }
    ]

I don't understand why or what it is have rating value greater than 1 ("rating" : 1.00000607967377 and "rating" : 1.03272235393524)

Some question similar but i still don't understand: MLLib spark -ALStrainImplicit value more than 1

Anybody help me explain abnormal value

Phong Nguyen
  • 173
  • 3
  • 15

1 Answers1

6

Don't worry about that ! There is nothing wrong with ALS.

Nevertheless, the prediction scores returned by ALS with implicit feedbacks with Apache Spark aren't normalized to fit be between [0,1], like you saw. You might even get negative values sometimes. (more on that here.)

ALS uses stochastic gradient descent and approximations to compute (and re-compute) users and item factors on each step to minimize the cost function which allows it to scale.

As a matter of fact, normalizing those scores isn't relevant. The reason for this is actually that those scores doesn't mean much on their own.

You can't use RMSE per example on those scores to evaluate the performance of your recommendations. If you are interested in evaluating this type of recommenders, I advice you to read my answer on How can I evaluate the implicit feedback ALS algorithm for recommendations in Apache Spark?

There is many techniques used in research or/and the industry to deal with such types of results. e.g You can binarize predictions per say using a threshold.

eliasah
  • 39,588
  • 11
  • 124
  • 154
  • 1
    you said: **The reason for this is actually that those scores doesn't mean much on their own** . So how to recommend user/item if do not use prediction scores. At the moment, i use ratings after ALS compute to recommendation for user/item – Phong Nguyen Oct 25 '17 at 03:21
  • The answer for that is in the last paragraph. – eliasah Oct 25 '17 at 07:47
  • 1
    i'm understanding that rating = 0.756096363067627 indicate that user like 70% with item. But with rating = 1.03272235393524 maybe indicate that user like 103% with item. It's look like no meaning. Did i understand right? – Phong Nguyen Oct 25 '17 at 09:22
  • There is actually implicit interpretation of that result. 0.75 means the user is most likely going to like the item but it doesn't mean the user likes the it with a probability of 70%. You need to look at it as a classification problem. You put a threshold, let's say 0.5. Above that score, you can consider that the user will like it. – eliasah Oct 25 '17 at 09:30
  • 1
    The answer I linked is very important to understand the evaluation of recommender system based on implicit rating. Unfortunately there is not much literature concerning the subject. But those I've link in the other answer explains it quite well. – eliasah Oct 25 '17 at 09:32
  • In an implicit feedback recommender system the predictions only really serve to define an ordering over items for a given user, where higher scores are stronger recommendations than lower scores. The scale of the number (or, indeed, the difference between the scores of two items) does not mean anything. The correct thing to do with these scores is to use them to sort items for a given user, then pick items from the top and use them for recommendations. There is no threshold that you should use: you simply always pick some of the top items in the sorted list and show them to the user. – Maciej Kula Nov 09 '17 at 15:57
  • @MaciejKula More or less, the threshold might be bothering you but it's actually used in practice. Otherwise I don't see what your comment adds to my answer. :) – eliasah Nov 09 '17 at 16:09
  • @MaciejKula you are the lightfm guy, nice to "meet" you ! :D – eliasah Nov 09 '17 at 16:14
  • Just wanted to emphasize the ranking objective. I hadn't seen thresholds in use, but it's interesting to know they are also useful! (And nice to meet you, too.) – Maciej Kula Nov 09 '17 at 16:55
  • @MaciejKula I wasn’t aware of that neither until I ran into it in Agarwal’s book about statistical methods for recommender systems. But you are absolutely right. – eliasah Nov 09 '17 at 16:57
  • @MaciejKula my linked answer emphasizes the ranking objective. It would be nice if you can take a look and tell me what you think – eliasah Nov 09 '17 at 16:58
  • This is not the merit of the question but are you sure that ALS uses SGD? I think the core idea behind ALS was that it can be optimized more effectively than with SGD... Or did you mean specific Spark implementation which uses SGD under the hood? – hnwoh Jun 22 '23 at 13:43