4

I am facing a basic problem of converting a list of dictionaries obtained from parsing a column with text in json format. Below is the brief snapshot of data:

[{u'PAGE TYPE': u'used-serp.model.brand.city'},
 {u'BODY TYPE': u'MPV Cars',
  u'ENGINE CAPACITY': u'1461',
  u'FUEL TYPE': u' Diesel',
  u'MODEL NAME': u'Renault Lodgy',
  u'OEM NAME': u'Renault',
  u'PAGE TYPE': u'New-ModelPage.OverviewTab'},
 {u'PAGE TYPE': u'used-serp.brand.city'},
 {u'BODY TYPE': u'SUV Cars',
  u'ENGINE CAPACITY': u'2477',
  u'FUEL TYPE': u' Diesel',
  u'MODEL NAME': u'Mitsubishi Pajero',
  u'OEM NAME': u'Mitsubishi',
  u'PAGE TYPE': u'New-ModelPage.OverviewTab'},
 {u'BODY TYPE': u'Hatchback Cars',
  u'ENGINE CAPACITY': u'1198',
  u'FUEL TYPE': u' Petrol , Diesel',
  u'MODEL NAME': u'Volkswagen Polo',
  u'OEM NAME': u'Volkswagen',
  u'PAGE TYPE': u'New-ModelPage.GalleryTab'},

Furthermore, the code i am using to parse is detailed below:

stdf_noncookie = []
stdf_noncookiejson = []

for index, row in df_noncookie.iterrows():
    try:
        loop_data = json.loads(row['attributes'])
        stdf_noncookie.append(loop_data)
    except ValueError:
        loop_nondata = row['attributes']
        stdf_noncookiejson.append(loop_nondata)

stdf_noncookie is the list of dictionaries i am trying to convert into a pandas dataframe. 'attributes' is the column with text in json format. I have tried to get some learning from this link, however this was not able to solve my problem. Any suggestion/tips for converting a list of dictionaries to panda dataframe will be helpful.

cs95
  • 379,657
  • 97
  • 704
  • 746
Arshad Islam
  • 107
  • 1
  • 1
  • 10

5 Answers5

8

To convert your list of dicts to a pandas dataframe use the following:

stdf_noncookiejson = pd.DataFrame.from_records(data)

pandas.DataFrame.from_records

DataFrame.from_records (data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)

You can set the index, name the columns etc as you read it in

If youre working with json you can also use the read_json method

stdf_noncookiejson = pd.read_json(data)

pandas.read_json

pandas.read_json (path_or_buf=None, orient=None, typ='frame', dtype=True, convert_axes=True, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None, encoding=None, lines=False)

Craicerjack
  • 6,203
  • 2
  • 31
  • 39
2

Reference this answer.

Assuming d is your List of Dictionaries, simply use:

df = pd.DataFrame(d)
Community
  • 1
  • 1
2

Simply, you can use the pandas DataFrame constructor.

import pandas as pd

print (pd.DataFrame(data))
amin
  • 1,413
  • 14
  • 24
0

Finally found a way to convert a list of dict to panda dataframe. Below is the code:

Method A
stdf_noncookie = df_noncookie['attributes'].apply(json.loads)
stdf_noncookie = stdf_noncookie.apply(pd.Series)

Method B
stdf_noncookie = df_noncookie['attributes'].apply(json.loads)
stdf_noncookie = pd.DataFrame(stdf_noncookie.tolist())

Method A is much quicker than Method B. I will create another post asking for help on the difference between the two methods. Also, on some datasets Method B is not working.

Arshad Islam
  • 107
  • 1
  • 1
  • 10
0

I was able to do it with a list comprehension. But my problem was that I left my dict's json encoded so they looked like strings.

d = r.zrangebyscore('live-ticks', '-inf', time.time())
dform = [json.loads(i) for i in d]
df = pd.DataFram(dfrom)
Warren
  • 1