1

I am trying to create a mapping of the list element values to the index. For example, given a pandas dataframe like this:

>>> book_df
    name                  genre
0   Harry Potter          ["fantasy", "young adult"]
1   Lord of the Rings     ["fantasy", "adventure", "classics"]
2   I, Robot              ["science fiction", "classics"]
3   Animal Farm           ["classics", "fantasy"]
4   A Monster Calls       ["fantasy", "young adult"]

I want to generate a dict which maps the genre to the list of movies that are under that genre.

So, what I'm trying to get is something like this:

>>> genre_to_book_map
{
    "fantasy": ["Harry Potter", "Lord of the Rings", "Animal Farm", "A Monster Calls"],
    "young adult": ["Harry Potter", "A Monster Calls"],
    "classics": ["Lord of the Rings", "I, Robot", "Animal Farm"],
    "science fiction": ["I, Robot"],
    "adventure": ["Lord of the Rings"]
}

I've managed to do this in a rather long-winded way by exploding the list then creating a dictionary out of it (based off Pandas column of lists, create a row for each list element and Pandas groupby two columns then get dict for values) like so:

exploded_genres = pd.DataFrame({
    "name" :np.repeat(book_df["name"].values, book_df["genres"].str.len())
}).assign(**{"genres":np.concatenate(book_df["genres"].values)})

genre_to_name_map = exploded_genres.groupby("genres")["name"].apply(lambda x: x.tolist())

but I'd like to know if there was a more efficient way of doing this as it seems like a relatively simple thing to do

m_cheah
  • 131
  • 1
  • 1
  • 8
  • https://stackoverflow.com/questions/32468402/how-to-explode-a-list-inside-a-dataframe-cell-into-separate-rows/32470490#32470490 – Alexander Aug 31 '19 at 21:07

3 Answers3

3

Since 0.25 you can use explode to expand the list.

book_df.explode('genre').groupby('genre')['name'].apply(list).to_dict()
Mark Wang
  • 2,623
  • 7
  • 15
  • @James well...apart from the explode, we are pretty much the same – Mark Wang Aug 31 '19 at 21:21
  • Accepted this solution because it's really neat. But I did find @James solution useful too. Thanks to both of you! – m_cheah Aug 31 '19 at 21:28
  • I had a different use case wherein I just wanted one value in the dictionary; and had multiple values in the key. For me, RomanPerekhrest's version didn't work but this did. Thanks, @MarkWang – Chintan Mehta Jan 25 '22 at 07:57
3

With old-good collections.defaultdict object:

In [410]: from collections import defaultdict                                                                                                              

In [411]: d = defaultdict(list)                                                                                                                            

In [412]: for idx, row in df.iterrows(): 
     ...:     for g in row['genre']: 
     ...:         d[g].append(row['name']) 
     ...:                                                                                                                                                  

In [413]: dict(d)                                                                                                                                          
Out[413]: 
{'fantasy': ['Harry Potter',
  'Lord of the Rings',
  'Animal Farm',
  'A Monster Calls'],
 'young adult': ['Harry Potter', 'A Monster Calls'],
 'adventure': ['Lord of the Rings'],
 'classics': ['Lord of the Rings', 'I, Robot', 'Animal Farm'],
 'science fiction': ['I, Robot']}
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • 1
    I like this solution too because it allows me to be more flexible with how I want to "explode" the information in the columns. e.g. if the genres were a list of dicts like `[{"type": "classics"}, {"type": "fantasy"}, {"type": "adventure"}]` – m_cheah Aug 31 '19 at 23:45
2

You need to melt the lists into individual genres, then groupby the genre and output to a dictionary.

import pandas as pd

df = pd.DataFrame({'name' : [
'Harry Potter',
'Lord of the Rings',
'I, Robot',
'Animal Farm',
'A Monster Calls'
],

'genre': [
 ["fantasy", "young adult"],
 ["fantasy", "adventure", "classics"],
 ["science fiction", "classics"],
 ["classics", "fantasy"],
 ["fantasy", "young adult"]
 ]
})

# create a Series object, give it a name.
s = df.genre.apply(pd.Series).stack().reset_index(level=-1, drop=True)
s.name = 'genres'

# merge and groubpy and output to dict.
d = (
    df.loc[:,['name']]
      .merge(s, left_index=True, right_index=True)
      .groupby('genres')['name']
      .apply(list)
      .to_dict()
)

James
  • 32,991
  • 4
  • 47
  • 70