1

I have the following list:

o_dict_list = [(OrderedDict([('StreetNamePreType', 'ROAD'), ('StreetName', 'Coffee')]), 'Ambiguous'),
           (OrderedDict([('StreetNamePreType', 'AVENUE'), ('StreetName', 'Washington')]), 'Ambiguous'),
           (OrderedDict([('StreetNamePreType', 'ROAD'), ('StreetName', 'Quartz')]), 'Ambiguous')]

And like the title says, I am trying to take this list and create a pandas dataframe where the columns are: 'StreetNamePreType' and 'StreetName' and the rows contain the corresponding values for each key in the OrderedDict.

I have done some searching on StackOverflow to get some guidance on how to create a dataframe, see here but I am getting an error when I run this code (I am trying to replicate what is going on in that response).

from collections import Counter, OrderedDict
import pandas as pd

col = Counter()
for k in o_dict_list:
    col.update(k)

df = pd.DataFrame([k.values() for k in o_dict_list], columns = col.keys())

When I run this code, the error I get is: TypeError: unhashable type: 'OrderedDict'

I looked up this error, here, I get that there is a problem with the datatypes, but I, unfortunately, I don't know enough about the inner workings of Python/Pandas to resolve this problem on my own.

I suspect that my list of OrderedDict is not exactly the same as in here which is why I am not getting my code to work. More specifically, I believe I have a list of sets, and each element contains an OrderedDict. The example, that I have linked to here seems to be a true list of OrderedDicts.

Again, I don't know enough about the inner workings of Python/Pandas to resolve this problem on my own and am looking for help.

grantaguinaldo
  • 109
  • 1
  • 3
  • 12

3 Answers3

4

I would use list comprehension to do this as follows.

pd.DataFrame([o_dict_list[i][0] for i, j in enumerate(o_dict_list)])

See the output below.

 StreetNamePreType  StreetName
0   ROAD            Coffee
1   AVENUE          Washington
2   ROAD            Quartz
Samuel Nde
  • 2,565
  • 2
  • 23
  • 23
  • Interestingly, if the length of `o_dict_list` is large, like 22K, I get `IndexError: string index out of range`. The solution works on small lists, but not on large ones. Any idea on how to modify so that it works on large lists? I was not aware that were limitations on the size of `o_dict_list`. – grantaguinaldo Oct 20 '18 at 04:27
  • @grantaguinaldo That is quite strange. I just tried on a 30k ordered list and did not get an `IndexError`. I am using python 3. Are you also using python 3? – Samuel Nde Oct 20 '18 at 04:31
  • I am using `Python 3.6.5 :: Anaconda, Inc.` so yeah, I am using Python 3. – grantaguinaldo Oct 20 '18 at 04:39
  • @grantaguinaldo We have the same environment but I cannot figure out why you are having that problem. Do you want to post a question about this? Meanwhile, I will am reaching out to my mentor for help. Please let me know if when you get an answer. I will keep you posted too. – Samuel Nde Oct 20 '18 at 04:43
  • I am also using a Window's 10 machine and `pandas==0.23.0` but I'll go ahead and post a question about the `IndexError` – grantaguinaldo Oct 20 '18 at 04:48
  • Same here. and my `numpy` version is `1.15.2`. You know, `pandas` runs on top of `numpy`. – Samuel Nde Oct 20 '18 at 04:51
  • Right, my `numpy` version is `1.14.3`. I am going to try and update. – grantaguinaldo Oct 20 '18 at 05:01
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/182182/discussion-between-nde-samuel-mbah-and-grantaguinaldo). – Samuel Nde Oct 20 '18 at 05:04
1

extracting the OrderedDict objects from your list and then use pd.Dataframe should work

values= []
for i in range(len(o_dict_list)):
    values.append(o_dict_list[i][0])

pd.DataFrame(values)


    StreetNamePreType   StreetName
0   ROAD    Coffee
1   AVENUE  Washington
2   ROAD    Quartz
nimrodz
  • 1,504
  • 1
  • 13
  • 18
1
d = [{'points': 50, 'time': '5:00', 'year': 2010}, 
{'points': 25, 'time': '6:00', 'month': "february"}, 
{'points':90, 'time': '9:00', 'month': 'january'}, 
{'points_h1':20, 'month': 'june'}]

pd.DataFrame(d)
Sagar
  • 21
  • 4
  • Super simple and worked well for me parsing data out of Salesforce using the simple_salesforce package. – Ross Oct 20 '20 at 17:17