0

I'm trying to convert some code in a book that uses Pandas 1.x to current Pandas, but the method level within the function count seems to have been deprecated.

Here is the code:

MovieLens.set_index(["title", "rating"]).count(level="rating")["user_id"]
IndUsers = MovieLens.set_index(["movie_id", "user_id"]).count(level="user_id")["title"]
print("Average movie reviews per user: ", IndUsers.mean())

IndMovies = MovieLens.set_index(["user_id", "title"]).count(level="title")["movie_id"]
print("\nNumber of Reviews Per Movie\n")
print(IndMovies)

The data structure (MovieLens) and output is as follows:

     user_id  movie_id  rating  timestamp gender  age  occupation    zip  \

0 1 1193 5 978300760 F 1 10 48067
1 2 1193 5 978298413 M 56 16 70072
2 12 1193 4 978220179 M 25 12 32793
3 15 1193 4 978199279 M 25 7 22903
4 17 1193 5 978158471 M 50 1 95350
... ... ... ... ... ... ... ... ...
1000204 5949 2198 5 958846401 M 18 17 47901
1000205 5675 2703 3 976029116 M 35 14 30030
1000206 5780 2845 1 958153068 M 18 17 92886
1000207 5851 3607 5 957756608 F 18 20 55410
1000208 5938 2909 4 957273353 M 25 1 35401

                                           title                genres  

0 One Flew Over the Cuckoo's Nest (1975) Drama
1 One Flew Over the Cuckoo's Nest (1975) Drama
2 One Flew Over the Cuckoo's Nest (1975) Drama
3 One Flew Over the Cuckoo's Nest (1975) Drama
4 One Flew Over the Cuckoo's Nest (1975) Drama
... ... ...
1000204 Modulations (1998) Documentary
1000205 Broken Vessels (1998) Drama
1000206 White Boys (1999) Drama
1000207 One Little Indian (1973) Comedy|Drama|Western
1000208 Five Wives, Three Secretaries and Me (1998) Documentary

[1000209 rows x 10 columns]


TypeError Traceback (most recent call last) Cell In[16], line 29 26 MovieLens = pd.merge(pd.merge(ratings, users), movies) 28 print(MovieLens)
---> 29 MovieLens.set_index(["title", "rating"]).count(level="rating")["user_id"] 30 IndUsers = MovieLens.set_index(["movie_id", "user_id"]).count(level="user_id")["title"] 31 #MovieLens.set_index(["title", "rating"]).count()["user_id"] 32 #IndUsers = MovieLens.set_index(["movie_id", "user_id"]).count()["title"]

TypeError: DataFrame.count() got an unexpected keyword argument 'level'

Expected Output (I am typing this out by hand since copy-paste is blocked by the book) Average Movie Reviews per User: 165.597...

Number of Reviews per Movie

title $1,000,000 Duck (1971) 37 [...additional movies] eXistenZ (1999) 410

I don't see a simple method within Pandas 2.0.2 to replace this.

-> Code generates an error -> Code without level doesn't differentiate between users (i.e. it ignores user_id, assumes all users were the same) -> Other count options in Pandas 2.0.2 don't provide the desired function

Hawerchuk
  • 1
  • 2
  • As presented, your question doesn't make any sense. Please edit your question to show a validly formatted minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. – itprorh66 Jun 27 '23 at 18:42
  • Please provide enough code so others can better understand or reproduce the problem. – Community Jun 27 '23 at 19:44

1 Answers1

-3

In Pandas 2.0.2, the count method no longer supports the level parameter. To achieve the same functionality, you can use the groupby method along with the size method.

Here's how you can modify your code to work with Pandas 2.0.2:

import pandas as pd

# Assuming you have already loaded the MovieLens data into the DataFrame 'MovieLens'

# Counting the number of ratings for each rating value
rating_counts = MovieLens.groupby(["title", "rating"]).size()
rating_counts = rating_counts.reset_index(name="count")
rating_counts = rating_counts.set_index("rating")["count"]
print(rating_counts)

# Counting the number of movies reviewed by each user
IndUsers = MovieLens.groupby(["user_id", "movie_id"]).size()
IndUsers = IndUsers.reset_index(name="count")
IndUsers = IndUsers.groupby("user_id")["count"].count()
print("Average movie reviews per user:", IndUsers.mean())

# Counting the number of reviews for each movie
IndMovies = MovieLens.groupby(["title", "user_id"]).size()
IndMovies = IndMovies.reset_index(name="count")
IndMovies = IndMovies.groupby("title")["count"].count()
print("\nNumber of Reviews Per Movie\n")
print(IndMovies)

In the modified code, we use the groupby method to group the data by the desired columns and then apply the size method to calculate the count within each group. We reset the index, rename the count column if necessary, and set the desired index for further analysis.

Please note that this code assumes you have loaded the MovieLens data into the DataFrame 'MovieLens' before applying these transformations. Make sure to adjust the code accordingly if you have a different DataFrame name or data structure.

VAHAB
  • 1
  • 1
    This is forbidden AI-generated text that you did not write. Delete this plagiarism and stop doing it. – tchrist Jun 27 '23 at 19:19
  • Wonderful, thank you! – Hawerchuk Jun 27 '23 at 19:19
  • 1
    Welcome to Stack Overflow! All of your three answers here appear likely to have been entirely or partially written by AI (e.g., ChatGPT). Please be aware that [posting AI-generated content is not allowed here](//meta.stackoverflow.com/q/421831). If you used an AI tool to assist with any answer, I would encourage you to delete it. We do hope you'll stick around and become a valuable part of our community by posting *your own* quality content. Thanks! – NotTheDr01ds Jul 03 '23 at 19:33
  • 1
    **Readers should review this answer carefully and critically, as AI-generated information often contains fundamental errors and misinformation.** If you observe quality issues and/or have reason to believe that this answer was generated by AI, please leave feedback accordingly. The moderation team can use your help to identify quality issues. – NotTheDr01ds Jul 03 '23 at 19:33