create a matrix-like dataframe in Pandas from column values

Question

I have an event log dataframe, with each row being an event (like viewing an item) with columns user_id, item_id, and the rating the user assigns the item. I'd like to create a dataframe representing the all user-item interactions: represented as a 2d matrix, each (i,j) represents the score for user i and item j (screenshot of diagram below). If the user hasn't seen that product yet, assign NaN.

I've tried doing this with loops but as expected is takes too long to run:

collab = pd.DataFrame(columns=log.item_id.unique(), index=log.user_id.unique())
for c in collab.columns:
    for u in collab.index:
        try:
            collab[c].loc[u] = log[(log.item_id == c) & (log.user_id == u)].score
        except:
            collab[c].loc[u] = np.nan

Is there a faster way around this?

as per requested in comments, the head of my dataframe. event_type is like the aforementioned score.

{'item_id': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
 'user_id': {0: 0, 1: 1, 2: 1, 3: 1, 4: 1},
 'event_type': {0: 2, 1: 2, 2: 5, 3: 7, 4: 6},
 'create_timestamp': {0: Timestamp('2020-02-03 15:47:25.273977'),
  1: Timestamp('2020-02-04 20:19:31.040304'),
  2: Timestamp('2020-02-04 20:19:00.110416'),
  3: Timestamp('2020-02-04 20:54:31.595305'),
  4: Timestamp('2020-02-04 20:20:15.918646')}}

Will you please show a sample of your `log` dataframe by adding the output of running `print(log.head().to_dict())` to the question? — , Dec 10 '21 at 19:49

score 2 · Answer 1 · answered Dec 10 '21 at 20:07

You can use df.pivot():

import pandas as pd

df = pd.DataFrame({'user_id': [1, 1, 1, 2, 2, 2],
                   'item_id': [1, 2, 3, 1, 2, 4],
                   'rating': [1, 2, 3, 4, 5, 6]})

df.pivot('user_id', 'item_id')

rating
  item_id   1     2       3       4
user_id                 
1         1.0   2.0     3.0     NaN
2         4.0   5.0     NaN     6.0

create a matrix-like dataframe in Pandas from column values

1 Answers1