1

I have a table with three columns: user_id, book_id and rating. So, one row shows what rating a user gave to a book.

I'm trying to remove rows that correspond to users who rated less than 10 books. I did something similar to what is described in answers to this question Remove low frequency values from pandas.dataframe . Here is my code:

threshold = 10
value_counts = ratings['user_id'].value_counts()
to_remove = value_counts[value_counts <= threshold].index
ratings.drop(to_remove, axis=0, inplace=True)

When I run it, I get an error in the last line:

ValueError: labels [40518 21743 30824 <...> 47178 46308 30460] not contained in axis

The table has 979478 rows, so the rows with these indices should exist. What am I doing wrong?

lawful_neutral
  • 633
  • 8
  • 29

1 Answers1

5

Using isin, cause, the user_id is not the index , we can not using .drop here.

threshold = 10
value_counts = ratings['user_id'].value_counts()
to_remove = value_counts[value_counts <= threshold].index
ratings.loc[~ratings['user_id'].isin(to_remove),:]
BENY
  • 317,841
  • 20
  • 164
  • 234