I have the following dataframe.
user movie rating
0 1 1 3
1 1 2 4
2 2 1 2
3 2 2 5
4 3 1 3
My desired output is
movie 1 2
user
1 3 4
2 2 5
3 3 0
If a user has not rated a movie, I need to have '0' in the corresponding output column, otherwise, the rating value should be present.
Note: I was able to achieve this with pivot_table, but the catch is my dataset contains more than 100000 columns because of which I am getting "Unstacked DataFrame is too big, causing int32 overflow". I am trying groupby as an alternative to bypass this error.
I am trying the following, but it doesn't include the values from 'value' column of my dataframe.
df.groupby(['user', 'movie']).size().unstack('movie', fill_value=0)