-1

I am trying to use Spark MLib ALS for collaborative filtering for Music Recommendation. Input data has several fields including userId, songId, artist etc. I have no ratings field in my data. ALS needs rating as its one of the parameter. I have looked around a lot but couldn't get any help. How can I proceed with this? Will it be fine to take listen_count (number of times user has listen to a particular song)

My dataset:

user_id song_id songtitle   artist  language    music_director
123        1     abc            artist1  English    NULL
345        2     xyz            artist2  English    NULL
456        3     abc            artist3  English    NULL
567        4     xyz            artist4  English    NULL
678        5     xyz            artist5  English    NULL
789        6     abc            artist6  English    NULL
Sonal
  • 561
  • 2
  • 6
  • 15

1 Answers1

2

Collaborative filtering algorithm takes rating as input to run. As listening to a song doesn't necessarily means that the user liked the song & the likeability can vary across users.

Hence a rating field helps to distinguish such varying reaction of users for different songs in this case & then predict ratings for users for songs which they haven't listened.

I think you are taking an inherent assumption that if the song is there in the user's list, user likes it. For that case you can add a rating column with a fixed filled value of 1 and run the code.

pratiklodha
  • 1,095
  • 12
  • 20
  • Does ALS algo demands user's dislike as well? Suppose I have data of users having which topics they "like" but unlike ratings that dataset doesn't give info about whether user is not interested in other topics or simply they just haven't discovered those topics yet. In that case making rating as 1 works? – Abhishek Apr 24 '19 at 04:53