0

I have JSON data and I would like to covert it to CSV

I've tried following this How can I convert JSON to CSV? but it still doesn't work.

I know I need to break the nested objects in JSON to be put in CSV.

Another thing I am concern about is I want to have a loop where multiple items (regardless of the number) can pass through the code as my "ITEM" variable may be >1

this is my JSON sample

data_dict = 
[{'count': 19804,
  'next': {'limit': 1, 'offset': 1},
  'previous': None,
  'results': [{'company_name': 'Sunshine and Flowers',
               'delivery_address': '34 olive beach house, #01-22, 612345',
               'delivery_timeslot': {'bounds': '[)',
                                     'items': [{'id': 21668,
                                                'metadata': {},
                                                'name': 'Loose hair flowers',
                                                'quantity': 1,
                                                'removed': None},
                                               {'id': 21667,
                                                'metadata': {},
                                                'name': "Groom's Boutonniere",
                                                'quantity': 1,
                                                'removed': None},
                                               {'id': 21666,
                                                'metadata': {},
                                                'name': 'Bridal Bouquet',
                                                'quantity': 1,
                                                'removed': None}],
                                     'lower': '2019-12-06T10:00:00Z',
                                     'upper': '2019-12-06T13:00:00Z'}}]}]

this is what I've tried so far

import csv
import json    

dict_data = json.loads(dict_data)
f = csv.writer(open("test.csv", "wb+"))

for dict_data in dict_data:
    f.writerow([dict_data["count"],
                dict_data["next"]["limit"],
                dict_data["next"]["offset"],
                dict_data["results"]["company_name"],
                dict_data["results"]["delivery_address"],
                dict_data["results"]["delivery_timeslot"]["lower"]["'upper"]["'bounds"],
                dict_data["results"]["items"]["id"]["name"]["'quantity"]["metadata"]["removed"]])

the error message I am getting is this

'not {!r}'.format(s.__class__.__name__)) TypeError: the JSON object must be str, bytes or bytearray, not 'list'

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Yuna
  • 73
  • 6
  • 1
    You don't need to call `json.loads(...)`, your data is a list of dictionaries that can be iterated over – Iain Shelvington Sep 06 '19 at 03:40
  • @IainShelvington hi, i have removed `json.load` yet i still get this error `dict_data["results"]["company_name"], TypeError: list indices must be integers or slices, not str` – Yuna Sep 06 '19 at 03:46
  • Don't rename `dict_data` in your for loop, name the variable that you assign to each iteration something else - `for data in dict_data:` – Iain Shelvington Sep 06 '19 at 03:53

1 Answers1

0

The Data:

  • The data shown in the question is a list of dicts, not a properly formatted json file.
  • I created a file, test.json with the following form:
[{'count': 19804,...,'results': []},
 {'count': 19805,...,'results': []},
 {'count': 19806,...,'results': []}]`
  • The data is the same as that given in the example, repeated 3 times.

Read in the data with ast.literal_eval:

import ast

with open("test.json", "r") as f:
    data = ast.literal_eval(f.read())

Use pandas to unpack the nested dicts:

import pandas as pd
from pandas.io.json import json_normalize

df = json_normalize(data,
                    record_path=['results', 'delivery_timeslot', 'items'],
                    meta=[['next'],
                          ['count'],
                          ['results', 'company_name'],
                          ['results', 'delivery_timeslot', 'bounds'],
                          ['results', 'delivery_timeslot', 'lower'],
                          ['results', 'delivery_timeslot', 'upper']])

df[['next.limit', 'next.offset']] = df['next'].apply(pd.Series)
df.drop(columns=['next'], inplace=True)

DataFrame View:

    id                 name  quantity removed metadata.lame metadata.horse  count  results.company_name results.delivery_timeslot.bounds results.delivery_timeslot.lower results.delivery_timeslot.upper  next.limit  next.offset
 21668   Loose hair flowers         1    None         Shit!          Shit!  19804  Sunshine and Flowers                               [)            2019-12-06T10:00:00Z            2019-12-06T13:00:00Z           1            1
 21667  Groom's Boutonniere         1    None          This           This  19804  Sunshine and Flowers                               [)            2019-12-06T10:00:00Z            2019-12-06T13:00:00Z           1            1
 21666       Bridal Bouquet         1    None            is             is  19804  Sunshine and Flowers                               [)            2019-12-06T10:00:00Z            2019-12-06T13:00:00Z           1            1
 21668   Loose hair flowers         1    None             a              a  19805  Sunshine and Flowers                               [)            2019-12-06T10:00:00Z            2019-12-06T13:00:00Z           1            1
 21667  Groom's Boutonniere         1    None        grassy         grassy  19805  Sunshine and Flowers                               [)            2019-12-06T10:00:00Z            2019-12-06T13:00:00Z           1            1
 21666       Bridal Bouquet         1    None        knoll.         knoll.  19805  Sunshine and Flowers                               [)            2019-12-06T10:00:00Z            2019-12-06T13:00:00Z           1            1
 21668   Loose hair flowers         1    None           NaN            NaN  19806  Sunshine and Flowers                               [)            2019-12-06T10:00:00Z            2019-12-06T13:00:00Z           1            1
 21667  Groom's Boutonniere         1    None           NaN            NaN  19806  Sunshine and Flowers                               [)            2019-12-06T10:00:00Z            2019-12-06T13:00:00Z           1            1
 21666       Bridal Bouquet         1    None           NaN            NaN  19806  Sunshine and Flowers                               [)            2019-12-06T10:00:00Z            2019-12-06T13:00:00Z           1            1
  • Rename any columns as needed:
    • df.rename(columns={'results.delivery_timeslot.lower': 'delivery_timeslot.lower'}, inplace=True)

Save it to a csv:

df.to_csv('test.csv', index=False)
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158