Create 2-dimensional array from dataframe object

Question

I have a dataframe object as follows:

'Name' 'Book' 'Rating'

'John' 'Moby Dick' 5

'John' 'The Alchemist' 3

'John' 'The Da Vinci Code' 4

'Peter' 'Moby Dick' 4

'Peter' 'Lolita' 5

'Mike' 'The Alchemist' 4

'Mike' 'The Da Vinci Code' 3

I want to create a 2-d array from this data of size [len('Name')][len('Book')] and the values of the array should be the ratings. If a person has not rated a book, value should be zero.

Here, my first row corresponds to John, next Peter and last row Mike and 1st column corresponds to Moby Dick, 2nd to The Alchemist, 3rd to Lolita and 4th to The Da Vinci Code.

I have tried the following where I scan the entire object (twice I believe, to get a match for each column).

import pandas
import numpy

df = read_from_file

uName = df['Name'].unique().tolist()
uBook = df['Book'].unique().tolist()
m = numpy.zeros((len(uName), len(uBook)))

for i in range(0, len(uName)):
    for j in range(0, len(uBook)):
        x = df.loc[(df['Name'] == uName[i]) & (df['Book'] == uBook[j])]
        if x.empty:
            m[i][j] = 0
        else:
            m[i][j] = x.Rating

It gives me the right results but it's extremely inefficient. My dataframe is huge - ~50,000 rows, and it takes forever to run this piece of code. Can you help me with a more efficient way to achieve this? Thanks.

score 0 · Answer 1 · answered Apr 18 '19 at 06:21

>>> import pandas as pd
>>> data = pd.DataFrame({'x': ['A', 'A', 'B','B', 'C'], 'y':['q','p','q', 'p', 'q'], 'r': [1, 2, 3, 4, 5]})
>>> data.pivot(values='r', columns='Name', index='Book')
x    A    B    C
y               
p  2.0  4.0  NaN
q  1.0  3.0  5.0

or

>>> data.pivot(values='r', columns='x', index='y').fillna(0).values
array([[2., 4., 0.],
       [1., 3., 5.]])

In case of your data frame the code should be something like this:

data.pivot(values='Rating', columns='x', index='y').fillna(0).values

score 0 · Answer 2 · answered Apr 18 '19 at 06:24

data(temp.csv):

Name,Book,Rating
John,Moby Dick, 5
John,The Alchemist, 3
John,The Da Vinci Code, 4
Peter,Moby Dick, 4
Peter,Lolita, 5
Mike,The Alchemist, 4
Mike,The Da Vinci Code, 3

code:

df = pd.read_csv('temp.csv')
print(df)
print(df.pivot(values='Rating', columns='Book', index='Name').fillna(0))

output

Book   Lolita  Moby Dick  The Alchemist  The Da Vinci Code
Name                                                      
John      0.0        5.0            3.0                4.0
Mike      0.0        0.0            4.0                3.0
Peter     5.0        4.0            0.0                0.0

Create 2-dimensional array from dataframe object

2 Answers2