1

I have a dataframe with 'genre' as a column. In this column, each entry has several values. For example, a movie 'Harry Potter' could have fantasy,adventure in the genre column. As I am doing a data analysis and exploration, I have no idea how to represent this column with multiple values to show any relationships between movies and/or genre.

I have thought of using a graph analysis to show the relationship, but I would like to explore other approaches I can consider?

sample data

This is a sample of tmdb_movies dataset

Zoozoo
  • 240
  • 4
  • 13
  • 2
    Your question would be much improved if you were able to provide a small sample of data and some desired output. Unfortunately, we can't decide what approach best suits your needs. This is opinion-based and highly dependent on your goal. – jpp Jun 03 '18 at 11:08
  • I totally agree with you. At this point, I would like to get opinions of experts in the fields to approach this dataset. – Zoozoo Jun 03 '18 at 11:25
  • Jezrael provided a great sample of the data. Thanks @jezrael. – Zoozoo Jun 03 '18 at 11:26

1 Answers1

5

You can use str.get_dummies for new indicator columns by genres:

df = pd.DataFrame({'Movies': ['Harry Potter', 'Toy Story'],
                   'Genres': ['fantasy,adventure', 
                              'adventure,animation,children,comedy,fantasy']})

#print (df)


df = df.set_index('Movies')['Genres'].str.get_dummies(',')
print (df)
              adventure  animation  children  comedy  fantasy
Movies                                                       
Harry Potter          1          0         0       0        1
Toy Story             1          1         1       1        1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks @jezrael for your answer. Perhaps, I should have included that I did consider this method. However, I cant figure out how I can visualize the relationships between genres for each movie. I considered an x-axis all the movies, and color coded points for each genres, but what would y be? – Zoozoo Jun 03 '18 at 11:24
  • @Zoozoo - Not so easy, but working for it. – jezrael Jun 03 '18 at 11:58
  • @Zoozoo - I think need one of these [solutions](https://stackoverflow.com/a/12286958/2901002) – jezrael Jun 03 '18 at 12:09
  • @Zoozoo - But still it depends of number of movies, if larger data it should be slow or another performance problems. – jezrael Jun 03 '18 at 12:10