I have an event log dataframe, with each row being an event (like viewing an item) with columns user_id
, item_id
, and the rating
the user assigns the item. I'd like to create a dataframe representing the all user-item interactions: represented as a 2d matrix, each (i,j) represents the score for user i and item j (screenshot of diagram below). If the user hasn't seen that product yet, assign NaN.
I've tried doing this with loops but as expected is takes too long to run:
collab = pd.DataFrame(columns=log.item_id.unique(), index=log.user_id.unique())
for c in collab.columns:
for u in collab.index:
try:
collab[c].loc[u] = log[(log.item_id == c) & (log.user_id == u)].score
except:
collab[c].loc[u] = np.nan
Is there a faster way around this?
as per requested in comments, the head of my dataframe. event_type
is like the aforementioned score.
{'item_id': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'user_id': {0: 0, 1: 1, 2: 1, 3: 1, 4: 1},
'event_type': {0: 2, 1: 2, 2: 5, 3: 7, 4: 6},
'create_timestamp': {0: Timestamp('2020-02-03 15:47:25.273977'),
1: Timestamp('2020-02-04 20:19:31.040304'),
2: Timestamp('2020-02-04 20:19:00.110416'),
3: Timestamp('2020-02-04 20:54:31.595305'),
4: Timestamp('2020-02-04 20:20:15.918646')}}