I have a dataframe object as follows:
'Name' 'Book' 'Rating'
'John' 'Moby Dick' 5
'John' 'The Alchemist' 3
'John' 'The Da Vinci Code' 4
'Peter' 'Moby Dick' 4
'Peter' 'Lolita' 5
'Mike' 'The Alchemist' 4
'Mike' 'The Da Vinci Code' 3
I want to create a 2-d array from this data of size [len('Name')][len('Book')] and the values of the array should be the ratings. If a person has not rated a book, value should be zero.
5 3 0 4
4 0 5 0
0 4 0 3
Here, my first row corresponds to John, next Peter and last row Mike and 1st column corresponds to Moby Dick, 2nd to The Alchemist, 3rd to Lolita and 4th to The Da Vinci Code.
I have tried the following where I scan the entire object (twice I believe, to get a match for each column).
import pandas
import numpy
df = read_from_file
uName = df['Name'].unique().tolist()
uBook = df['Book'].unique().tolist()
m = numpy.zeros((len(uName), len(uBook)))
for i in range(0, len(uName)):
for j in range(0, len(uBook)):
x = df.loc[(df['Name'] == uName[i]) & (df['Book'] == uBook[j])]
if x.empty:
m[i][j] = 0
else:
m[i][j] = x.Rating
It gives me the right results but it's extremely inefficient. My dataframe is huge - ~50,000 rows, and it takes forever to run this piece of code. Can you help me with a more efficient way to achieve this? Thanks.