I'm trying to convert some code in a book that uses Pandas 1.x to current Pandas, but the method level within the function count seems to have been deprecated.
Here is the code:
MovieLens.set_index(["title", "rating"]).count(level="rating")["user_id"]
IndUsers = MovieLens.set_index(["movie_id", "user_id"]).count(level="user_id")["title"]
print("Average movie reviews per user: ", IndUsers.mean())
IndMovies = MovieLens.set_index(["user_id", "title"]).count(level="title")["movie_id"]
print("\nNumber of Reviews Per Movie\n")
print(IndMovies)
The data structure (MovieLens) and output is as follows:
user_id movie_id rating timestamp gender age occupation zip \
0 1 1193 5 978300760 F 1 10 48067
1 2 1193 5 978298413 M 56 16 70072
2 12 1193 4 978220179 M 25 12 32793
3 15 1193 4 978199279 M 25 7 22903
4 17 1193 5 978158471 M 50 1 95350
... ... ... ... ... ... ... ... ...
1000204 5949 2198 5 958846401 M 18 17 47901
1000205 5675 2703 3 976029116 M 35 14 30030
1000206 5780 2845 1 958153068 M 18 17 92886
1000207 5851 3607 5 957756608 F 18 20 55410
1000208 5938 2909 4 957273353 M 25 1 35401
title genres
0 One Flew Over the Cuckoo's Nest (1975) Drama
1 One Flew Over the Cuckoo's Nest (1975) Drama
2 One Flew Over the Cuckoo's Nest (1975) Drama
3 One Flew Over the Cuckoo's Nest (1975) Drama
4 One Flew Over the Cuckoo's Nest (1975) Drama
... ... ...
1000204 Modulations (1998) Documentary
1000205 Broken Vessels (1998) Drama
1000206 White Boys (1999) Drama
1000207 One Little Indian (1973) Comedy|Drama|Western
1000208 Five Wives, Three Secretaries and Me (1998) Documentary
[1000209 rows x 10 columns]
TypeError Traceback (most recent call last)
Cell In[16], line 29
26 MovieLens = pd.merge(pd.merge(ratings, users), movies)
28 print(MovieLens)
---> 29 MovieLens.set_index(["title", "rating"]).count(level="rating")["user_id"]
30 IndUsers = MovieLens.set_index(["movie_id", "user_id"]).count(level="user_id")["title"]
31 #MovieLens.set_index(["title", "rating"]).count()["user_id"]
32 #IndUsers = MovieLens.set_index(["movie_id", "user_id"]).count()["title"]
TypeError: DataFrame.count() got an unexpected keyword argument 'level'
Expected Output (I am typing this out by hand since copy-paste is blocked by the book) Average Movie Reviews per User: 165.597...
Number of Reviews per Movie
title $1,000,000 Duck (1971) 37 [...additional movies] eXistenZ (1999) 410
I don't see a simple method within Pandas 2.0.2 to replace this.
-> Code generates an error -> Code without level doesn't differentiate between users (i.e. it ignores user_id, assumes all users were the same) -> Other count options in Pandas 2.0.2 don't provide the desired function