0

Apologies if this questions seems very easy in advance!

Given the following small dataset pd.DataFrame():

    userId  movieId     rating
0   1       169         2.5
1   1       2471        3.0
2   1       48516       5.0
3   2       2571        3.5
4   2       109487      4.0
5   2       112552      5.0
6   2       112556      4.0
7   3       356         4.0
8   3       2394        4.0
9   3       2431        5.0

I would like to extract all the movieId that one user with userId has watched! The output for the above dataset I expect to get is something like this:

[[169, 2471, 48516], [2571, 109487, 112552, 112556], [356, 2394, 2431]]

I have written a for loop which results different than what I expected and seems extremely inefficient as the size of the dataset increases:

mv_lst = []
usrID = np.unique(test_df['userId'])
for i,v in enumerate( test_df['userId'] ):
    if v in usrID:
        mv_lst.append(test_df['movieId'][i])
print(mv_lst)
# result: [169, 2471, 48516, 2571, 109487, 112552, 112556, 356, 2394, 2431]

Is there smarter and cleaner alternative in pandas to do this? Cheers,

Farid Alijani
  • 839
  • 1
  • 7
  • 25

0 Answers0