Reshaping a Python dict to a Pandas dataframe

Question

I gave a Python dictionary that looks something like:

myDict = {'A':['a','b','c','d','e'],
          'B':['f','g','h']}

I would like to convert this to a Pandas dataframe that has the following structure:

  key val
0   B   f
1   B   g
2   B   h
3   A   a
4   A   b
5   A   c
6   A   d
7   A   e

I can't use the standard method to enter a dictionary into a Pandas dataframe but I have come up with a method that works but seems a but clunky. Basically, I create 2 lists containing the keys and the values, convert to a another dict and import that dict into a Pandas dataframe. I'm sure I could combine some of the lines using list or dictionary comprehensions but the final command would be unreadable. I've kept each line separate for the moment to make reading the code easier.

myDict = {'A':['a','b','c','d','e'],
          'B':['f','g','h']}

# Column of keys
keys = [[k]*len(v) for k,v in myDict.items()]

# Flatten list
keys = [item for sublist in keys for item in sublist]

# Column of values
values = [v for k,v in myDict.items()]

# Flatten list
values = [item for sublist in values for item in sublist]

key = 'key'
value = 'val'

df = pd.DataFrame({key:keys,value:values})

print(df)

Perhaps I've been staring at this too long but my question is whether there is a simpler, built-in command to achieve the same goal.

score 4 · Answer 1 · answered Jan 25 '19 at 15:48

4

You can simply iterate through the dict and then subsequently iterate through each of the lists:

df = pd.DataFrame(
    data=[(k, v) for k, l in myDict.items() for v in l],
    columns=['key', 'value']
)

answered Jan 25 '19 at 15:48

a_guest

34,165
12
64
118

IMO this is the cleanest and easiest solution here. Readability :+1: – Wesley Bowman Jan 25 '19 at 16:28

score 1 · Accepted Answer · answered Jan 25 '19 at 15:39

1

You can do this using a one-liner:

pd.DataFrame.from_dict(myDict, orient='index')
    .stack()
    .reset_index()
    .drop('level_1', axis=1)
    .rename(columns={'level_0': 'key', 0: 'value'})

  key value
0   A     a
1   A     b
2   A     c
3   A     d
4   A     e
5   B     f
6   B     g
7   B     h

answered Jan 25 '19 at 15:39

gold_cy

13,648
3
23
45

Lots of great answers all vastly superior to my original. Whilst this solution is, perhaps, less readable than @a_guest answer, it does seem more robust when starting dict contains a single integer value e.g. dict = {'A':1,'B':2}. Other solutions fail because an integer is not iterable. Whilst this wasn't a requirement of the original question, I do prefer this solution for that reason. – user1718097 Jan 25 '19 at 16:58

score 0 · Answer 3 · answered Jan 25 '19 at 15:35

0

Using unnesting

df=unnesting(pd.Series(myDict).to_frame(),[0]).reset_index()
df.columns=['key','value']
df
Out[402]: 
  key value
0   A     a
1   A     b
2   A     c
3   A     d
4   A     e
5   B     f
6   B     g
7   B     h

answered Jan 25 '19 at 15:35

BENY

317,841
20
164
234

1

@aws_apprentice less work and easy to understand . `pd.Series(myDict).apply(pd.Series).stack().reset_index(level=0) ` – BENY Jan 25 '19 at 15:39
don't forget to rename columns as well – gold_cy Jan 25 '19 at 15:40
@aws_apprentice ok – BENY Jan 25 '19 at 15:41

score 0 · Answer 4 · answered Jan 25 '19 at 15:43

One option that operates outside pandas but simplifies the operations a bit:

my_dict = {'A':['a','b','c','d','e'],
          'B':['f','g','h']}

my_vals=[list(zip(k*len(v), v)) for k,v in my_dict.items()]

my_rows=[item for sublist in my_vals for item in sublist ]

pd.DataFrame(my_rows, columns=['key','val'])

    key val
0   A   a
1   A   b
2   A   c
3   A   d
4   A   e
5   B   f
6   B   g
7   B   h

score 0 · Answer 5 · answered Jan 25 '19 at 15:47

You can also try using:

import pandas as pd

myDict = {'A':['a','b','c','d','e'],
          'B':['f','g','h']}

# Creating dataframe from the dictionary 
df = pd.DataFrame.from_dict(myDict, orient='index')

# Transpose to change keys to column names
# stack creates level by pivoting columns
# reset index and only select two columns
df = df.T.stack().reset_index().iloc[:,1:]

# renaming columns
df.columns = ['key', 'value']

Reshaping a Python dict to a Pandas dataframe

5 Answers5