I'm doing analysis on movies, and each movie have a genre
attribute, it might be several specific genre, like drama
, comedy
, the data looks like this:
movie_list = [
{'name': 'Movie 1',
'genre' :'Action, Fantasy, Horror'},
{'name': 'Movie 2',
'genre' :'Action, Comedy, Family'},
{'name': 'Movie 3',
'genre' :'Biography, Drama'},
{'name': 'Movie 4',
'genre' :'Biography, Drama, Romance'},
{'name': 'Movie 5',
'genre' :'Drama'},
{'name': 'Movie 6',
'genre' :'Documentary'},
]
The problem is that, how do I do analysis on this? For example, how do I know how many action moviews are here, and how do I query for the category action? Specifically:
How do I get all the categories in this list? So I know each contains how many moviews
How do I query for a certain kind of movies, like action?
Do I need to turn the
genre
intoarray
?
Currently I can get away the 2nd question with df[df['genre'].str.contains("Action")].describe()
, but is there better syntax?