1

I am very new python. Starting learning few days ago. Recently I have got a project to implement collaborative filtering(user to Album recommendation) using python. For this exercise i have data in multiple CSV files which I was able to read and build a data frame, but i was struck on how to convert into a frequency/boolean matrix

AlbumId ArtistId

A1      R1 
A1      R2 
A2      R2
A2      R4
A2      R3
A3      R3
A3      R2
A4      R4
A4      R1

Now I want the initial dataframe to be converted into above frequency matrix as below.

     R1 R2 R3 R4
A1   1  1  0  0

A2   0  1  1  1

A3   0  1  1  0

A4   1  0  0  1

Can you guys help me in this conversion. I want to use this matrix in my later subsequent Cosine similarity calculations

  • Note that if you really want a frequency matrix and there can be duplicates in your data, what you want is `pivot_table()`, not `get_dummies()`. Add a value column of all 1's (`df['val'] = 1`), then use pivot_table to expand it out: `pd.pivot_table(df, 'val', 'AlbumId', 'ArtistId', aggfunc='sum', fill_value=0)`. – Ian Kent May 17 '18 at 19:56

0 Answers0