Convert pandas DataFrame to dict and preserve duplicated indexes

Question

vagrant@ubuntu-xenial:~/lb/f5/v12$ python
Python 2.7.12 (default, Nov 12 2018, 14:36:49)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> data = [{'name': 'bob', 'age': 20}, {'name': 'jim', 'age': 25}, {'name': 'bob', 'age': 30}]
>>> df = pd.DataFrame(data)
>>> df.set_index(keys='name', drop=False, inplace=True)
>>> df
      age name
name
bob    20  bob
jim    25  jim
bob    30  bob
>>> df.to_dict(orient='index')
{'bob': {'age': 30, 'name': 'bob'}, 'jim': {'age': 25, 'name': 'jim'}}
>>>

If we convert the dataframe to a dictionary, the duplicate entry (bob, age 20) is removed. Is there any possible way to produce a dictionary whose values are a list of dictionaries? Something that looks like this?

{'bob': [{'age': 20, 'name': 'bob'}, {'age': 30, 'name': 'bob'}], 'jim': [{'age': 25, 'name': 'jim'}]}

possible solution can be foud here: https://stackoverflow.com/questions/10664856/make-dictionary-with-duplicate-keys-in-python — cors, Jan 10 '19 at 22:45

score 11 · Accepted Answer · edited Jun 20 '20 at 09:12

It should be possible to do this if you group on the index.

`groupby` Comprehension

{k: g.to_dict(orient='records') for k, g in df.groupby(level=0)}
# {'bob': [{'age': 20, 'name': 'bob'}, {'age': 30, 'name': 'bob'}],
#  'jim': [{'age': 25, 'name': 'jim'}]}

Details
groupby allows us to partition the data based on unique keys:

for k, g in df.groupby(level=0):
    print(g, end='\n\n')

      age name
name          
bob    20  bob
bob    30  bob

      age name
name          
jim    25  jim

For each group, convert this into a dictionary using the "records" orient:

for k, g in df.groupby(level=0):
    print(g.to_dict('r'))

[{'age': 20, 'name': 'bob'}, {'age': 30, 'name': 'bob'}]
[{'age': 25, 'name': 'jim'}]

And have it accessible by the grouper key.

`GroupBy.apply` + `to_dict`

df.groupby(level=0).apply(lambda x: x.to_dict('r')).to_dict()
# {'bob': [{'age': 20, 'name': 'bob'}, {'age': 30, 'name': 'bob'}],
#  'jim': [{'age': 25, 'name': 'jim'}]}

apply does the same thing that the dictionary comprehension does—it iterates over each group. The only difference is apply will require one final to_dict call at the end to dictify the data.

You sir, are a wizard and scholar. Thank you so much! I had been struggling with this for hours. Could I get a breakdown of these two approaches? — Nolan, Jan 10 '19 at 22:47

Convert pandas DataFrame to dict and preserve duplicated indexes

1 Answers1

`groupby` Comprehension

`GroupBy.apply` + `to_dict`

Linked

Convert pandas DataFrame to dict and preserve duplicated indexes

1 Answers1

groupby Comprehension

GroupBy.apply + to_dict

Linked

`groupby` Comprehension

`GroupBy.apply` + `to_dict`