How to create a Pandas DataFrame from a list of OrderedDicts?

Question

I have the following list:

o_dict_list = [(OrderedDict([('StreetNamePreType', 'ROAD'), ('StreetName', 'Coffee')]), 'Ambiguous'),
           (OrderedDict([('StreetNamePreType', 'AVENUE'), ('StreetName', 'Washington')]), 'Ambiguous'),
           (OrderedDict([('StreetNamePreType', 'ROAD'), ('StreetName', 'Quartz')]), 'Ambiguous')]

And like the title says, I am trying to take this list and create a pandas dataframe where the columns are: 'StreetNamePreType' and 'StreetName' and the rows contain the corresponding values for each key in the OrderedDict.

I have done some searching on StackOverflow to get some guidance on how to create a dataframe, see here but I am getting an error when I run this code (I am trying to replicate what is going on in that response).

from collections import Counter, OrderedDict
import pandas as pd

col = Counter()
for k in o_dict_list:
    col.update(k)

df = pd.DataFrame([k.values() for k in o_dict_list], columns = col.keys())

When I run this code, the error I get is: TypeError: unhashable type: 'OrderedDict'

I looked up this error, here, I get that there is a problem with the datatypes, but I, unfortunately, I don't know enough about the inner workings of Python/Pandas to resolve this problem on my own.

I suspect that my list of OrderedDict is not exactly the same as in here which is why I am not getting my code to work. More specifically, I believe I have a list of sets, and each element contains an OrderedDict. The example, that I have linked to here seems to be a true list of OrderedDicts.

Again, I don't know enough about the inner workings of Python/Pandas to resolve this problem on my own and am looking for help.

score 4 · Accepted Answer · answered Oct 20 '18 at 04:17

4

I would use list comprehension to do this as follows.

pd.DataFrame([o_dict_list[i][0] for i, j in enumerate(o_dict_list)])

See the output below.

 StreetNamePreType  StreetName
0   ROAD            Coffee
1   AVENUE          Washington
2   ROAD            Quartz

answered Oct 20 '18 at 04:17

Samuel Nde

2,565
2
23
23

Interestingly, if the length of `o_dict_list` is large, like 22K, I get `IndexError: string index out of range`. The solution works on small lists, but not on large ones. Any idea on how to modify so that it works on large lists? I was not aware that were limitations on the size of `o_dict_list`. – grantaguinaldo Oct 20 '18 at 04:27
@grantaguinaldo That is quite strange. I just tried on a 30k ordered list and did not get an `IndexError`. I am using python 3. Are you also using python 3? – Samuel Nde Oct 20 '18 at 04:31
I am using `Python 3.6.5 :: Anaconda, Inc.` so yeah, I am using Python 3. – grantaguinaldo Oct 20 '18 at 04:39
@grantaguinaldo We have the same environment but I cannot figure out why you are having that problem. Do you want to post a question about this? Meanwhile, I will am reaching out to my mentor for help. Please let me know if when you get an answer. I will keep you posted too. – Samuel Nde Oct 20 '18 at 04:43
I am also using a Window's 10 machine and `pandas==0.23.0` but I'll go ahead and post a question about the `IndexError` – grantaguinaldo Oct 20 '18 at 04:48
Same here. and my `numpy` version is `1.15.2`. You know, `pandas` runs on top of `numpy`. – Samuel Nde Oct 20 '18 at 04:51
Right, my `numpy` version is `1.14.3`. I am going to try and update. – grantaguinaldo Oct 20 '18 at 05:01
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/182182/discussion-between-nde-samuel-mbah-and-grantaguinaldo). – Samuel Nde Oct 20 '18 at 05:04

score 1 · Answer 2 · answered Oct 20 '18 at 03:53

extracting the OrderedDict objects from your list and then use pd.Dataframe should work

values= []
for i in range(len(o_dict_list)):
    values.append(o_dict_list[i][0])

pd.DataFrame(values)


    StreetNamePreType   StreetName
0   ROAD    Coffee
1   AVENUE  Washington
2   ROAD    Quartz

score 1 · Answer 3 · answered Oct 20 '18 at 04:12

1

d = [{'points': 50, 'time': '5:00', 'year': 2010}, 
{'points': 25, 'time': '6:00', 'month': "february"}, 
{'points':90, 'time': '9:00', 'month': 'january'}, 
{'points_h1':20, 'month': 'june'}]

pd.DataFrame(d)

answered Oct 20 '18 at 04:12

Sagar

21
4

Super simple and worked well for me parsing data out of Salesforce using the simple_salesforce package. – Ross Oct 20 '20 at 17:17

How to create a Pandas DataFrame from a list of OrderedDicts?

3 Answers3

Linked