0

I have a Pandas dataframe that contains genres of rated movies. Some movies fall under multiple genres, each genre separated by a "|". You can see examples of this in the code below.


import pandas as pd
unames = ['user_id', 'gender', 'age', 'occupation', 'zip']
users = pd.read_table('ml-1m/users.dat', sep='::', header=None, names=unames, engine='python')

rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings = pd.read_table('ml-1m/ratings.dat', sep='::', header=None, names=rnames, engine='python')

mnames = ['movie_id', 'title', 'genres']
movies = pd.read_table('ml-1m/movies.dat', sep='::', header=None, names=mnames, engine='python')

df = pd.merge(pd.merge(ratings, users), movies)
df["genres"].value_counts()

enter image description here

As you can see, the value_counts() method isn't an effective way of counting the number of times each unique genre is rated. Is there a pandas method that would allow me to count the number of times each unique genre "word" appears or do I need to use some loops to separate all the combined genres out?

Nova
  • 588
  • 4
  • 16
  • Please don’t post images of the code and data as we can’t test them. Instead, post the code, a sample of the DataFrame and expected output directly inside a code block. This allows us to easily reproduce your problem and help you. Take the time to read [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) and [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples), and revise your question accordingly. – Rodalm May 17 '22 at 22:45

1 Answers1

1

You could use the regex r'\s*\|\s*' or even *[|] * to split your genre column then explode the column and do the count. Note that \s stands for space. and since | is a metacharacter, you need to escape it by a backspace or by placing it in a character class ie []

df['genre'].str.split(' *[|] *').explode().value_counts()

Drama                                      4
Comedy                                     3
Romance                                    3
Western                                    1
Children's                                 1
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • Why is there a space between the "]" and "*". Accepted answer due to regex and method explanations. Thank you! – Nova May 17 '22 at 23:11