I have a dataframe with user_ids as columns and the ids of the movies they've liked as row values. Here's a snippet:
15 30 50 93 100 113 1008 1028
0 3346.0 42779.0 1816.0 191319.0 138.0 183.0 171.0 283.0
1 1543.0 NaN 169.0 5319.0 34899.0 188.0 42782.0 1183.0
2 5942.0 NaN 30438.0 195514.0 169.0 172.0 187.0 5329.0
3 3249.0 NaN 32361.0 225.0 87.0 547.0 6710.0 283.0
4 794.0 NaN 187.0 195734.0 6297.0 8423.0 1289.0 222.0
I'm trying to calculate the Jaccard Similarity between each column (i.e. between each user using the movies they've liked). Python gives the following error when I try to use the jaccard_similarity_score found in sklearn:
ValueError: continuous is not supported
Ideally, as a result, I would like to get a matrix with rows and columns of user_id's and the values as the similarity scores for each.
How can I go about computing the jaccard similarities between these columns? I've tried to use a list of dictionaries with keys as user Ids and values as lists of movies, but it takes forever to compute.