I have a table with three columns: user_id
, book_id
and rating
. So, one row shows what rating a user gave to a book.
I'm trying to remove rows that correspond to users who rated less than 10 books. I did something similar to what is described in answers to this question Remove low frequency values from pandas.dataframe . Here is my code:
threshold = 10
value_counts = ratings['user_id'].value_counts()
to_remove = value_counts[value_counts <= threshold].index
ratings.drop(to_remove, axis=0, inplace=True)
When I run it, I get an error in the last line:
ValueError: labels [40518 21743 30824 <...> 47178 46308 30460] not contained in axis
The table has 979478 rows, so the rows with these indices should exist. What am I doing wrong?