0

I have converted a simple JSON to CSV successfully. I am facing issue , when the file contains Array of JSON Objects. I am using csv module not pandas for converting. Please refer the content below which is getting processed successfully and which is failing :

Sucess (When the file contains single list/array of json object ):

[{"value":0.97,"key_1":"value1","key_2":"value2","key_3":"value3","key_11":"2019-01-01T00:05:00Z"}]

Fail :

[{"value":0.97,"key_1":"value1","key_2":"value2","key_3":"value3","key_11":"2019-01-01T00:05:00Z"}]
[{"value":0.97,"key_1":"value1","key_2":"value2","key_3":"value3","key_11":"2019-01-01T00:05:00Z"}]
[{"value":0.97,"key_1":"value1","key_2":"value2","key_3":"value3","key_11":"2019-01-01T00:05:00Z"}]

The json.loads function is throwing exception as follows :

Extra data ; line 1 column 6789 (char 1234)

How can to process such files ?

EDIT : This file is flushed using Kinesis Firehorse and pushed to S3. I am using lambda to download the file and load it and transform. so it is not a .json file.

Shivkumar Mallesappa
  • 2,875
  • 7
  • 41
  • 68

3 Answers3

4

Parse each line like so:

with open('input.json') as f:
    for line in f:
        obj = json.loads(line)
Alex Hall
  • 34,833
  • 5
  • 57
  • 89
1

Because your file is not valid JSON. You have to read your file line-by-line and then convert each line individually to object.

Or, you can convert your file structure like this...

[
  {
    "value": 0.97,
    "key_1": "value1",
    "key_2": "value2",
    "key_3": "value3",
    "key_11": "2019-01-01T00:05:00Z"
  },
  {
    "value": 0.97,
    "key_1": "value1",
    "key_2": "value2",
    "key_3": "value3",
    "key_11": "2019-01-01T00:05:00Z"
  },
  {
    "value": 0.97,
    "key_1": "value1",
    "key_2": "value2",
    "key_3": "value3",
    "key_11": "2019-01-01T00:05:00Z"
  }
]

and it will be a valid JSON file.

tanaydin
  • 5,171
  • 28
  • 45
0

As tanaydin said, your failing input is not valid json. It should look something like this:

[
    {
        "value":0.97,
        "key_1":"value1",
        "key_2":"value2",
        "key_3":"value3",
        "key_11":"2019-01-01T00:05:00Z"
    },
    {"value":0.97,"key_1":"value1","key_2":"value2","key_3":"value3","key_11":"2019-01-01T00:05:00Z"},
    {"value":0.97,"key_1":"value1","key_2":"value2","key_3":"value3","key_11":"2019-01-01T00:05:00Z"}
]

I assume you're creating the json output by iterating over a list of objects and calling json.dumps on each one. You should create your list of dictionaries, then call json.dumps on the whole list instead.

list_of_dicts_to_jsonify = {}
object_attributes = ['value', 'key_1', 'key_2', 'key_3', 'key_11']
for item in list_of_objects:
    # Convert object to dictionary
    obj_dict = {}
    for k in object_attributes:
        obj_dict[k] = getattr(item, k) or None
    list_of_dicts_to_jsonify.append(obj_dict)

json_output = json.dumps(list_of_dicts_to_jsonify)
Rowshi
  • 360
  • 2
  • 11