1

I have a dataset of users, books and ratings and I want to find users who rated high particular book and to those users I want to find what other books they liked too.

My data looks like:

df.sample(5)

    User-ID     ISBN    Book-Rating
49064   102967  0449244741  8
60600   251150  0452264464  9
376698  52853   0373710720  7
454056  224764  0590416413  7
54148   25409   0312421273  9

I did so far:

df_p = df.pivot_table(index='ISBN', columns='User-ID', values='Book-Rating').fillna(0)
lotr = df_p.ix['0345339703'] # Lord of the Rings Part 1
like_lotr = lotr[lotr > 7].to_frame()
users = like_lotr['User-ID']

last line failed for

KeyError: 'User-ID'

I want to obtain users who rated LOTR > 7 to those users further find movies they liked too from the matrix.

Help would be appreciated. Thanks.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
Stanislav Jirák
  • 465
  • 3
  • 7
  • 17

1 Answers1

1

In your like_lotr dataframe 'User-ID' is the name of the index, you cannot select it like a normal column. That is why the line users = like_lotr['User-ID'] raises a KeyError. It is not a column.

Moreover ix is deprecated, better to use loc in your case. And don't put quotes: it need to be an integer, since 'User-ID' was originally a column of integers (at least from your sample).

Try like this:

df_p = df.pivot_table(index='ISBN', columns='User-ID', values='Book-Rating').fillna(0)
lotr = df_p.loc[452264464] # used another number from your sample dataframe to test this code.
like_lotr = lotr[lotr > 7].to_frame()
users = like_lotr.index.tolist()

user is now a list with the ids you want.

Using your small sample above and the number I used to test, user is [251150].


An alternative solution is to use reset_index. The two last lins should look like this:

like_lotr = lotr[lotr > 7].to_frame().reset_index()
users = like_lotr['User-ID']

reset_index put the index back in the columns.

Valentino
  • 7,291
  • 6
  • 18
  • 34
  • Yes, that's it! But how do I now filter the dataset/matrix to keep only those users and find what books they rated high? – Stanislav Jirák Aug 05 '19 at 17:52
  • That would be another question. However, once you have the ids in the `users` you can go back to your original dataframe and do: `df.loc[df['User-ID'].isin(users)]`. This will select all the users and from here you can get the other data you want. – Valentino Aug 05 '19 at 18:23