I recently came across LightFM while learning to train a recommender system. And so far what I know is that it utilizes loss functions which are logistic, BPR, WARP and k-OS WARP. I did not go through the math behind all these functions. Now what I am confused about is that how will I know that which loss function to use where?
1 Answers
From lightfm model documentation page:
logistic: useful when both positive (1) and negative (-1) interactions are present.
BPR: Bayesian Personalised Ranking 1 pairwise loss. Maximises the prediction difference between a positive example and a randomly chosen negative example. Useful when only positive interactions are present and optimising ROC AUC is desired.
WARP: Weighted Approximate-Rank Pairwise [2] loss. Maximises the rank of positive examples by repeatedly sampling negative examples until rank violating one is found. Useful when only positive interactions are present and optimising the top of the recommendation list (precision@k) is desired.
k-OS WARP: k-th order statistic loss [3]. A modification of WARP that uses the k-the positive example for any given user as a basis for pairwise updates.
Everything boils down to how your dataset is structured and what kind of user interacions you're looking at. Obviously one approach would be to include the loss function in your parameter grid when going through hyperparameter tuning (at least that's what I did) and check model accuracy. I find investingating why a given loss function performed better/worse on a dataset as a good learning exercise.

- 1,589
- 1
- 9
- 26