Convert Python dict into a dataframe

Question

I have a Python dictionary like the following:

{u'2012-06-08': 388,
 u'2012-06-09': 388,
 u'2012-06-10': 388,
 u'2012-06-11': 389,
 u'2012-06-12': 389,
 u'2012-06-13': 389,
 u'2012-06-14': 389,
 u'2012-06-15': 389,
 u'2012-06-16': 389,
 u'2012-06-17': 389,
 u'2012-06-18': 390,
 u'2012-06-19': 390,
 u'2012-06-20': 390,
 u'2012-06-21': 390,
 u'2012-06-22': 390,
 u'2012-06-23': 390,
 u'2012-06-24': 390,
 u'2012-06-25': 391,
 u'2012-06-26': 391,
 u'2012-06-27': 391,
 u'2012-06-28': 391,
 u'2012-06-29': 391,
 u'2012-06-30': 391,
 u'2012-07-01': 391,
 u'2012-07-02': 392,
 u'2012-07-03': 392,
 u'2012-07-04': 392,
 u'2012-07-05': 392,
 u'2012-07-06': 392}

The keys are Unicode dates and the values are integers. I would like to convert this into a pandas dataframe by having the dates and their corresponding values as two separate columns. Example: col1: Dates col2: DateValue (the dates are still Unicode and datevalues are still integers)

     Date         DateValue
0    2012-07-01    391
1    2012-07-02    392
2    2012-07-03    392
.    2012-07-04    392
.    ...           ...
.    ...           ...

Any help in this direction would be much appreciated. I am unable to find resources on the pandas docs to help me with this.

I know one solution might be to convert each key-value pair in this dict, into a dict so the entire structure becomes a dict of dicts, and then we can add each row individually to the dataframe. But I want to know if there is an easier way and a more direct way to do this.

So far I have tried converting the dict into a series object but this doesn't seem to maintain the relationship between the columns:

s  = Series(my_dict,index=my_dict.keys())

I have tried converting the dict into a series object with the dates as index but that didn't match up the dates with the corresponding values for some reason. — anonuser0428, Sep 16 '13 at 21:04
the code has been posted. I want to inquire whether there is a way to create a dataframe without creating a dict-of-dicts and then adding each row separately. — anonuser0428, Sep 16 '13 at 21:08
What is a "Unicode date"? Do you mean an [ISO 8601](http://en.wikipedia.org/wiki/ISO_8601) date? — Peter Mortensen, Nov 16 '15 at 21:04

score 833 · Accepted Answer · edited Jan 31 '20 at 15:30

833

The error here, is since calling the DataFrame constructor with scalar values (where it expects values to be a list/dict/... i.e. have multiple columns):

pd.DataFrame(d)
ValueError: If using all scalar values, you must must pass an index

You could take the items from the dictionary (i.e. the key-value pairs):

In [11]: pd.DataFrame(d.items())  # or list(d.items()) in python 3
Out[11]:
             0    1
0   2012-07-02  392
1   2012-07-06  392
2   2012-06-29  391
3   2012-06-28  391
...

In [12]: pd.DataFrame(d.items(), columns=['Date', 'DateValue'])
Out[12]:
          Date  DateValue
0   2012-07-02        392
1   2012-07-06        392
2   2012-06-29        391

But I think it makes more sense to pass the Series constructor:

In [21]: s = pd.Series(d, name='DateValue')
Out[21]:
2012-06-08    388
2012-06-09    388
2012-06-10    388

In [22]: s.index.name = 'Date'

In [23]: s.reset_index()
Out[23]:
          Date  DateValue
0   2012-06-08        388
1   2012-06-09        388
2   2012-06-10        388

edited Jan 31 '20 at 15:30

Nick is tired

6,860
20
39
51

answered Sep 16 '13 at 21:12

Andy Hayden

359,921
101
625
535

8

@user1009091 I realised what the error means now, it's basically saying "What I'm seeing is a Series, so use Series constructor". – Andy Hayden Sep 16 '13 at 21:16
1

Thanks - very helpful. Could you perhaps explain what's the difference between using this method and using DataFrame.from_dict() ? Your method (which I used) returns type = pandas.core.frame.DataFrame, while the other returns type = class 'pandas.core.frame.DataFrame'. Any chance you could explain the difference and when each method is appropriate? Thanks in advance :) – Optimesh Jan 04 '15 at 10:01
1

they are both similar, [`from_dict`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_dict.html) has an orient kwarg, so I might use it if I wanted to avoid transposing. There are few options with `from_dict`, under the hood it's not really different from using DataFrame constructor. – Andy Hayden Jan 04 '15 at 18:49
57

I'm seeing `pandas.core.common.PandasError: DataFrame constructor not properly called!` from the first example – allthesignals Mar 29 '16 at 17:44
19

@allthesignals adding list() around d.items works: pd.DataFrame(list(d.items()), columns=['Date', 'DateValue']) – sigurdb Feb 22 '18 at 19:48
2

@AndyHayden why did you advocate the series over the item solution? Is it because the OP had a bazillion entries? First option worked for me, so thanks were given. – Vaidøtas I. Aug 11 '19 at 19:00

score 362 · Answer 2 · edited Dec 22 '22 at 09:53

362

When converting a dictionary into a pandas dataframe where you want the keys to be the columns of said dataframe and the values to be the row values, you can do simply put brackets around the dictionary like this:

>>> dict_ = {'key 1': 'value 1', 'key 2': 'value 2', 'key 3': 'value 3'}
>>> pd.DataFrame([dict_])
 
    key 1     key 2     key 3
0   value 1   value 2   value 3

EDIT: In the pandas docs one option for the data parameter in the DataFrame constructor is a list of dictionaries. Here we're passing a list with one dictionary in it.

edited Dec 22 '22 at 09:53

starball

20,030
7
43
238

answered Oct 05 '17 at 03:53

cheevahagadog

4,638
3
15
15

14

Yes I also did this but added .T to transpose. – Anton vBR Feb 14 '18 at 20:50
1

It works fine but don't know why we have to do it like this. – hui chen Jun 12 '19 at 13:21
1

what if i want one these column to be used as index – om tripathi Sep 18 '19 at 11:08

ntg · Answer 3 · 2019-11-14T07:26:39.363

176

As explained on another answer using pandas.DataFrame() directly here will not act as you think.

What you can do is use pandas.DataFrame.from_dict with orient='index':

In[7]: pandas.DataFrame.from_dict({u'2012-06-08': 388,
 u'2012-06-09': 388,
 u'2012-06-10': 388,
 u'2012-06-11': 389,
 u'2012-06-12': 389,
 .....
 u'2012-07-05': 392,
 u'2012-07-06': 392}, orient='index', columns=['foo'])
Out[7]: 
            foo
2012-06-08  388
2012-06-09  388
2012-06-10  388
2012-06-11  389
2012-06-12  389
........
2012-07-05  392
2012-07-06  392

edited Nov 14 '19 at 07:26

answered Sep 02 '15 at 03:07

ntg

12,950
7
74
95

2

can we chain this with any `rename` method to also set the names of the index **and** columns in one go ? – Ciprian Tomoiagă Jan 29 '17 at 16:28
5

good point. One example would be: ...., orient='index').rename(columns={0:'foobar'}) – ntg Feb 21 '17 at 15:59
1

You can also specify pandas.DataFrame.from_dict(..., orient = 'index', columns = ['foo', 'bar']), this is from the [source listed above](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.from_dict.html). – spen.smith Nov 13 '19 at 17:51
good point, this is true from pandas .22 which was after the original answer... Updated my answer... – ntg Nov 14 '19 at 07:18

score 85 · Answer 4 · edited Oct 11 '17 at 00:26

85

Pass the items of the dictionary to the DataFrame constructor, and give the column names. After that parse the Date column to get Timestamp values.

Note the difference between python 2.x and 3.x:

In python 2.x:

df = pd.DataFrame(data.items(), columns=['Date', 'DateValue'])
df['Date'] = pd.to_datetime(df['Date'])

In Python 3.x: (requiring an additional 'list')

df = pd.DataFrame(list(data.items()), columns=['Date', 'DateValue'])
df['Date'] = pd.to_datetime(df['Date'])

edited Oct 11 '17 at 00:26

Peter Lustig

941
11
23

answered Sep 16 '13 at 21:11

Viktor Kerkez

45,070
12
104
85

3

This gives me: `PandasError: DataFrame constructor not properly called!` – Chris Nielsen Nov 17 '16 at 22:35
18

@ChrisNielsen You are probably using python3. You should try: `df = pd.DataFrame(list(data.items()), columns=['Date', 'DateValue'])` – Viktor Kerkez Nov 22 '16 at 11:11
This is the better answer because it shows what must be done in Python 3. – ifly6 May 04 '18 at 20:05

score 65 · Answer 5 · answered Aug 04 '19 at 01:12

65

p.s. in particular, I've found Row-Oriented examples helpful; since often that how records are stored externally.

https://pbpython.com/pandas-list-dict.html

answered Aug 04 '19 at 01:12

Neil

7,482
6
50
56

score 16 · Answer 6 · answered Dec 20 '17 at 10:07

Pandas have built-in function for conversion of dict to data frame.

pd.DataFrame.from_dict(dictionaryObject,orient='index')

For your data you can convert it like below:

import pandas as pd
your_dict={u'2012-06-08': 388,
 u'2012-06-09': 388,
 u'2012-06-10': 388,
 u'2012-06-11': 389,
 u'2012-06-12': 389,
 u'2012-06-13': 389,
 u'2012-06-14': 389,
 u'2012-06-15': 389,
 u'2012-06-16': 389,
 u'2012-06-17': 389,
 u'2012-06-18': 390,
 u'2012-06-19': 390,
 u'2012-06-20': 390,
 u'2012-06-21': 390,
 u'2012-06-22': 390,
 u'2012-06-23': 390,
 u'2012-06-24': 390,
 u'2012-06-25': 391,
 u'2012-06-26': 391,
 u'2012-06-27': 391,
 u'2012-06-28': 391,
 u'2012-06-29': 391,
 u'2012-06-30': 391,
 u'2012-07-01': 391,
 u'2012-07-02': 392,
 u'2012-07-03': 392,
 u'2012-07-04': 392,
 u'2012-07-05': 392,
 u'2012-07-06': 392}

your_df_from_dict=pd.DataFrame.from_dict(your_dict,orient='index')
print(your_df_from_dict)

That is really bad solution, since is saves dictionary keys as index. — An economist, Aug 24 '18 at 14:55
It's not a bad solution, maybe someone wants the dict key as the index. If you want the dict key as a regular column and not an index, then you can do extra steps, see https://stackoverflow.com/questions/18837262/convert-python-dict-into-a-dataframe/58520351#58520351 — wisbucky, Sep 15 '22 at 15:42

score 16 · Answer 7 · answered Oct 23 '19 at 10:03

16

This is what worked for me, since I wanted to have a separate index column

df = pd.DataFrame.from_dict(some_dict, orient="index").reset_index()
df.columns = ['A', 'B']

answered Oct 23 '19 at 10:03

Abercrombie

1,012
2
13
22

This fixed so index was corrected for me – user1564762 Jan 13 '22 at 16:26

score 13 · Answer 8 · answered Sep 02 '15 at 05:45

13

pd.DataFrame({'date' : dict_dates.keys() , 'date_value' : dict_dates.values() })

answered Sep 02 '15 at 05:45

Nader Hisham

5,214
4
19
35

score 10 · Answer 9 · answered Jun 04 '21 at 13:55

The simplest way I found is to create an empty dataframe and append the dict. You need to tell panda's not to care about the index, otherwise you'll get the error: TypeError: Can only append a dict if ignore_index=True

import pandas as pd
mydict = {'foo': 'bar'}
df = pd.DataFrame()
df = df.append(mydict, ignore_index=True)

smerllo · Answer 10 · 2019-03-11T01:44:10.703

9

This is how it worked for me :

df= pd.DataFrame([d.keys(), d.values()]).T
df.columns= ['keys', 'values']  # call them whatever you like

I hope this helps

edited Mar 11 '19 at 01:44

answered Feb 26 '19 at 22:11

smerllo

3,117
1
22
37

score 9 · Answer 11 · edited Dec 22 '22 at 22:50

9

The point is how to put each element in a DataFrame.

Row-wise:

pd.DataFrame(dic.items(), columns=['Date', 'Value'])

or columns-wise:

pd.DataFrame([dic])

edited Dec 22 '22 at 22:50

Mario

1,631
2
21
51

answered Mar 06 '22 at 07:16

msbeigi

189
1
5

score 6 · Answer 12 · answered Jan 03 '17 at 08:09

6

You can also just pass the keys and values of the dictionary to the new dataframe, like so:

import pandas as pd

myDict = {<the_dict_from_your_example>]
df = pd.DataFrame()
df['Date'] = myDict.keys()
df['DateValue'] = myDict.values()

answered Jan 03 '17 at 08:09

Blairg23

11,334
6
72
72

Artem Zaika · Answer 13 · 2018-03-20T05:46:22.147

In my case I wanted keys and values of a dict to be columns and values of DataFrame. So the only thing that worked for me was:

data = {'adjust_power': 'y', 'af_policy_r_submix_prio_adjust': '[null]', 'af_rf_info': '[null]', 'bat_ac': '3500', 'bat_capacity': '75'} 

columns = list(data.keys())
values = list(data.values())
arr_len = len(values)

pd.DataFrame(np.array(values, dtype=object).reshape(1, arr_len), columns=columns)

firstly · Answer 14 · 2015-08-21T20:25:13.887

5

Accepts a dict as argument and returns a dataframe with the keys of the dict as index and values as a column.

def dict_to_df(d):
    df=pd.DataFrame(d.items())
    df.set_index(0, inplace=True)
    return df

edited Aug 21 '15 at 20:25

answered Aug 19 '15 at 18:47

firstly

61
1
7

take a dict, returns a data frame – firstly Aug 21 '15 at 20:23

score 1 · Answer 15 · edited May 06 '18 at 12:55

I think that you can make some changes in your data format when you create dictionary, then you can easily convert it to DataFrame:

input:

a={'Dates':['2012-06-08','2012-06-10'],'Date_value':[388,389]}

output:

{'Date_value': [388, 389], 'Dates': ['2012-06-08', '2012-06-10']}

input:

aframe=DataFrame(a)

output: will be your DataFrame

You just need to use some text editing in somewhere like Sublime or maybe Excel.

score 1 · Answer 16 · answered Jan 30 '19 at 06:39

d = {'Date': list(yourDict.keys()),'Date_Values': list(yourDict.values())}
df = pandas.DataFrame(data=d)

If you don't encapsulate yourDict.keys() inside of list() , then you will end up with all of your keys and values being placed in every row of every column. Like this:

Date \ 0 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1... 1 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1... 2 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1... 3 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1... 4 (2012-06-08, 2012-06-09, 2012-06-10, 2012-06-1...

But by adding list() then the result looks like this:

Date Date_Values 0 2012-06-08 388 1 2012-06-09 388 2 2012-06-10 388 3 2012-06-11 389 4 2012-06-12 389 ...

score 1 · Answer 17 · answered Nov 25 '22 at 02:14

%timeit result on a common dictionary and pd.DataFrame.from_dict() is the clear winner.

%timeit cols_df = pd.DataFrame.from_dict(clu_meta,orient='index',columns=['Columns_fromUser'])
214 µs ± 9.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit pd.DataFrame([clu_meta])
943 µs ± 10.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit pd.DataFrame(clu_meta.items(), columns=['Default_colNames', 'Columns_fromUser'])
285 µs ± 7.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

score 0 · Answer 18 · answered Apr 04 '17 at 13:33

I have run into this several times and have an example dictionary that I created from a function get_max_Path(), and it returns the sample dictionary:

{2: 0.3097502930247044, 3: 0.4413177909384636, 4: 0.5197224051562838, 5: 0.5717654946470984, 6: 0.6063959031223476, 7: 0.6365209824708223, 8: 0.655918861281035, 9: 0.680844386645206}

To convert this to a dataframe, I ran the following:

df = pd.DataFrame.from_dict(get_max_path(2), orient = 'index').reset_index()

Returns a simple two column dataframe with a separate index:

index 0 0 2 0.309750 1 3 0.441318

Just rename the columns using f.rename(columns={'index': 'Column1', 0: 'Column2'}, inplace=True)

Convert Python dict into a dataframe

18 Answers18

Linked

Related