2

I gave a Python dictionary that looks something like:

myDict = {'A':['a','b','c','d','e'],
          'B':['f','g','h']}

I would like to convert this to a Pandas dataframe that has the following structure:

  key val
0   B   f
1   B   g
2   B   h
3   A   a
4   A   b
5   A   c
6   A   d
7   A   e

I can't use the standard method to enter a dictionary into a Pandas dataframe but I have come up with a method that works but seems a but clunky. Basically, I create 2 lists containing the keys and the values, convert to a another dict and import that dict into a Pandas dataframe. I'm sure I could combine some of the lines using list or dictionary comprehensions but the final command would be unreadable. I've kept each line separate for the moment to make reading the code easier.

myDict = {'A':['a','b','c','d','e'],
          'B':['f','g','h']}

# Column of keys
keys = [[k]*len(v) for k,v in myDict.items()]

# Flatten list
keys = [item for sublist in keys for item in sublist]

# Column of values
values = [v for k,v in myDict.items()]

# Flatten list
values = [item for sublist in values for item in sublist]

key = 'key'
value = 'val'

df = pd.DataFrame({key:keys,value:values})

print(df)

Perhaps I've been staring at this too long but my question is whether there is a simpler, built-in command to achieve the same goal.

user1718097
  • 4,090
  • 11
  • 48
  • 63

5 Answers5

4

You can simply iterate through the dict and then subsequently iterate through each of the lists:

df = pd.DataFrame(
    data=[(k, v) for k, l in myDict.items() for v in l],
    columns=['key', 'value']
)
a_guest
  • 34,165
  • 12
  • 64
  • 118
1

You can do this using a one-liner:

pd.DataFrame.from_dict(myDict, orient='index')
    .stack()
    .reset_index()
    .drop('level_1', axis=1)
    .rename(columns={'level_0': 'key', 0: 'value'})

  key value
0   A     a
1   A     b
2   A     c
3   A     d
4   A     e
5   B     f
6   B     g
7   B     h
gold_cy
  • 13,648
  • 3
  • 23
  • 45
  • Lots of great answers all vastly superior to my original. Whilst this solution is, perhaps, less readable than @a_guest answer, it does seem more robust when starting dict contains a single integer value e.g. dict = {'A':1,'B':2}. Other solutions fail because an integer is not iterable. Whilst this wasn't a requirement of the original question, I do prefer this solution for that reason. – user1718097 Jan 25 '19 at 16:58
0

Using unnesting

df=unnesting(pd.Series(myDict).to_frame(),[0]).reset_index()
df.columns=['key','value']
df
Out[402]: 
  key value
0   A     a
1   A     b
2   A     c
3   A     d
4   A     e
5   B     f
6   B     g
7   B     h
BENY
  • 317,841
  • 20
  • 164
  • 234
0

One option that operates outside pandas but simplifies the operations a bit:

my_dict = {'A':['a','b','c','d','e'],
          'B':['f','g','h']}

my_vals=[list(zip(k*len(v), v)) for k,v in my_dict.items()]

my_rows=[item for sublist in my_vals for item in sublist ]

pd.DataFrame(my_rows, columns=['key','val'])

    key val
0   A   a
1   A   b
2   A   c
3   A   d
4   A   e
5   B   f
6   B   g
7   B   h
G. Anderson
  • 5,815
  • 2
  • 14
  • 21
0

You can also try using:

import pandas as pd

myDict = {'A':['a','b','c','d','e'],
          'B':['f','g','h']}

# Creating dataframe from the dictionary 
df = pd.DataFrame.from_dict(myDict, orient='index')

# Transpose to change keys to column names
# stack creates level by pivoting columns
# reset index and only select two columns
df = df.T.stack().reset_index().iloc[:,1:]

# renaming columns
df.columns = ['key', 'value']
niraj
  • 17,498
  • 4
  • 33
  • 48