I have two medium-sized datasets which looks like:
books_df.head()
ISBN Book-Title Book-Author
0 0195153448 Classical Mythology Mark P. O. Morford
1 0002005018 Clara Callan Richard Bruce Wright
2 0060973129 Decision in Normandy Carlo D'Este
3 0374157065 Flu: The Story of the Great Influenza Pandemic... Gina Bari Kolata
4 0393045218 The Mummies of Urumchi E. J. W. Barber
and
ratings_df.head()
User-ID ISBN Book-Rating
0 276725 034545104X 0
1 276726 0155061224 5
2 276727 0446520802 0
3 276729 052165615X 3
4 276729 0521795028 6
And I wanna get a pivot table like this:
ISBN 1 2 3 4 5 6 7 8 9 10 ... 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952
User-ID
1 5.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
I've tried:
R_df = ratings_df.pivot(index = 'User-ID', columns ='ISBN', values = 'Book-Rating').fillna(0) # Memory overflow
which failed for:
MemoryError:
and this:
R_df = q_data.groupby(['User-ID', 'ISBN'])['Book-Rating'].mean().unstack()
which failed for the same.
I want to use it for singular value decomposition and matrix factorization.
Any ideas?
The dataset I'm working with is: http://www2.informatik.uni-freiburg.de/~cziegler/BX/