Apologies if this questions seems very easy in advance!
Given the following small dataset pd.DataFrame()
:
userId movieId rating
0 1 169 2.5
1 1 2471 3.0
2 1 48516 5.0
3 2 2571 3.5
4 2 109487 4.0
5 2 112552 5.0
6 2 112556 4.0
7 3 356 4.0
8 3 2394 4.0
9 3 2431 5.0
I would like to extract all the movieId
that one user with userId
has watched!
The output for the above dataset I expect to get is something like this:
[[169, 2471, 48516], [2571, 109487, 112552, 112556], [356, 2394, 2431]]
I have written a for loop which results different than what I expected and seems extremely inefficient as the size of the dataset increases:
mv_lst = []
usrID = np.unique(test_df['userId'])
for i,v in enumerate( test_df['userId'] ):
if v in usrID:
mv_lst.append(test_df['movieId'][i])
print(mv_lst)
# result: [169, 2471, 48516, 2571, 109487, 112552, 112556, 356, 2394, 2431]
Is there smarter and cleaner alternative in pandas to do this? Cheers,