I have a Pandas dataframe that contains genres of rated movies. Some movies fall under multiple genres, each genre separated by a "|". You can see examples of this in the code below.
import pandas as pd
unames = ['user_id', 'gender', 'age', 'occupation', 'zip']
users = pd.read_table('ml-1m/users.dat', sep='::', header=None, names=unames, engine='python')
rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings = pd.read_table('ml-1m/ratings.dat', sep='::', header=None, names=rnames, engine='python')
mnames = ['movie_id', 'title', 'genres']
movies = pd.read_table('ml-1m/movies.dat', sep='::', header=None, names=mnames, engine='python')
df = pd.merge(pd.merge(ratings, users), movies)
df["genres"].value_counts()
As you can see, the value_counts()
method isn't an effective way of counting the number of times each unique genre is rated. Is there a pandas method that would allow me to count the number of times each unique genre "word" appears or do I need to use some loops to separate all the combined genres out?