Creating a dataframe from a dict where keys are tuples

Question

I have the following dict, with keys as tuples:

d = {('first', 'row'): 3, ('second', 'row'): 1}

I'd like to create a dataframe with 3 columns: Col1, Col2 and Col3 which should look like this:

Col1   Col2  Col3
first  row   3
second row   4

I can't figure out how to split the tuples other than parsing the dict pair by pair.

I'd suggest you to accept @ayhan's answer - it's much more elegant! — MaxU - stand with Ukraine, May 16 '17 at 22:04
Done. I honestly didn't understand it completely at first, but I agree: it is more elegant — BogdanC, May 16 '17 at 22:11

ayhan · Accepted Answer · 2017-05-23T07:54:45.720

35

Construct a Series first, then resetting the index will give you a DataFrame:

pd.Series(d).reset_index()
Out: 
  level_0 level_1  0
0   first     row  3
1  second     row  1

You can rename columns afterwards:

df = pd.Series(d).reset_index()   
df.columns = ['Col1', 'Col2', 'Col3']   
df
Out: 
     Col1 Col2  Col3
0   first  row     3
1  second  row     1

Or in one-line, first naming the MultiIndex:

pd.Series(d).rename_axis(['Col1', 'Col2']).reset_index(name='Col3')
Out[7]: 
     Col1 Col2  Col3
0   first  row     3
1  second  row     1

edited May 23 '17 at 07:54

answered May 16 '17 at 21:47

ayhan

70,170
20
182
203

1

this is interesting! I didn't know we can create Series directly from such a dict... – MaxU - stand with Ukraine May 16 '17 at 21:49
I didn't know that a Series could become a dataframe... think I have some docs to read. – elPastor May 17 '17 at 02:30
1

@pshep123 Yes, normally you can use `ser.to_frame('name_of_the_column)` to convert a Series to a single-column DataFrame. `reset_index` by default converts the index to column(s) and since a Series cannot have more than one column, it also converts to a DataFrame as well. – ayhan May 17 '17 at 04:45
This was a good answer. I tried passing it directly to a dataframe and did not get the expected solution. `pd.DataFrame.from_dict(d,orient="index").reset_index()` did not work directly. – Jon May 17 '17 at 16:44

MaxU - stand with Ukraine · Answer 2 · 2017-05-16T21:53:32.043

5

Not that elegant as @ayhan's solution:

In [21]: pd.DataFrame(list(d), columns=['Col1','Col2']).assign(Col3=d.values())
Out[21]:
     Col1 Col2  Col3
0   first  row     3
1  second  row     1

or a straightforward one:

In [27]: pd.DataFrame([[k[0],k[1],v] for k,v in d.items()]) \
           .rename(columns={0:'Col1',1:'Col2',2:'Col2'})
Out[27]:
     Col1  Col2  Col2
0   first   row     3
1  second   row     1

edited May 16 '17 at 21:53

answered May 16 '17 at 21:47

MaxU - stand with Ukraine

205,989
36
386
419

score 5 · Answer 3 · answered May 17 '17 at 17:08

I was curious if it were possible to use MultiIndexes, so I made an attempt. This may have its benefits if you want to specify levels. But simply following the pandas documentation example ( MultiIdex) I came up with an alternative solution.

First I created a dictionary of random data

s = {(1,2):"a", (4,5):"b", (1,5):"w", (2, 3):"z", (4,1):"p"}

Then I used pd.MultiIndex to create a Hierarchical index from the dictionary's keys.

index = pd.MultiIndex.from_tuples(s.keys())


index
Out[3]: 
MultiIndex(levels=[[1, 2, 4], [1, 2, 3, 5]],
        labels=[[0, 2, 2, 1, 0], [1, 3, 0, 2, 3]])

Then, I pass the dictionary's values directly to a pandas Series, and explicitly set the index to be the MultiIndex object I created above.

pd.Series(s.values(), index=index)
Out[4]: 
1  2    a
4  5    b
   1    p
2  3    z
1  5    w
dtype: object

Lastly, I reset the index to get the solution requested by OP

pd.Series(s.values(), index=index).reset_index()
Out[5]: 
level_0  level_1  0
0        1        2  a
1        4        5  b
2        4        1  p
3        2        3  z
4        1        5  w

This was a bit more involved, so @ayhan's answer may still be preferable, but I think this gives you an idea of what pandas may be doing in the background. Or at least give anyone the opportunity to tinker with pandas' mechanics a bit more.

score 2 · Answer 4 · answered May 27 '17 at 15:36

You can easily create a data frame form a dict:

import pandas as pd

d = {('first', 'row'): 3, ('second', 'row'): 1}

df = pd.DataFrame.from_dict({'col': d}, orient='columns')

df

        |     | col |
 ------ | --- | --- |
 first  | row |   3 |
 second | row |   1 |

Now for cosmetic purposes, you can get your output dataframe with:

df = df.reset_index()
df.columns = 'Col1 Col2 Col3'.split()

score 2 · Answer 5 · answered Mar 12 '22 at 01:56

One option is to do the wrangling within vanilla python before creating the dataframe:

outcome = [(*key, val) for key, val in d.items()]

pd.DataFrame(outcome, columns = ['Col1', 'Col2', 'Col3'])

     Col1 Col2  Col3
0   first  row     3
1  second  row     1

You can generate the columns as well:

columns = [f"Col{num}" for num in range(1, len(outcome[0]) + 1)]

pd.DataFrame(outcome, columns = columns)

You could build the DataFrame from a dictionary:


outcome = {f"Col{num+1}": [*arr] 
           for num, arr 
           in enumerate(zip(*outcome))}

pd.DataFrame(outcome)

     Col1 Col2  Col3
0   first  row     3
1  second  row     1

Creating a dataframe from a dict where keys are tuples

5 Answers5

Linked