-1

I have a JSON file that includes many documents. Each document is data from one Purchase Order. I get file from a web service on a cloud purchase order system. I need to load each of these documents into a separate record in an Oracle database. I have done this for other files of JSON documents using Oracle's external table feature, and it has worked. However, the other files had a crlf between each JSON document. The file I get from the web service is one document with many POs, with no crlf between the purchase orders.

I found the Q&A here: How to split json into multiple files per document. The code shown as the solution is

import json
in_file_path='path/to/file.json' # Change me!
with open(in_file_path,'r') as in_json_file:
    # Read the file and convert it to a dictionary
    json_obj_list = json.load(in_json_file)
    for json_obj in json_obj_list:
        filename=json_obj['_id']+'.json'
        with open(filename, 'w') as out_json_file:
            # Save each obj to their respective filepath
            # with pretty formatting thanks to `indent=4`
            json.dump(json_obj, out_json_file, indent=4)

but when I try the solution, I get an error as below:

[oracle@localhost gk]$ python36 split.py
Traceback (most recent call last):
  File "split.py", line 11, in <module>
    filename=json_obj['_id']+'.json'
TypeError: string indices must be integers

My JSON file looks like:

{
    "data": [
        {
            "number": "PB510698",
            "uuid": "9cc06f21c1194038b137cec51b02606b"
        },

        etc ...

    ]
}

with multiple docs (sub docs?) that start with {"number":"PB510698","uuid"

Any ideas why the code from the other post is not working?

martineau
  • 119,623
  • 25
  • 170
  • 301
  • 2
    I'm not sure to understand what you are trying to do but what do you have exactly in your json_obj variable? The error says that it's a string not a json – Mouna Ben Chamekh Feb 15 '19 at 16:12
  • What is an example of the filename obtained from the `json_obj['_id']`? I am unsure what `'_id'` might be. – martineau Feb 15 '19 at 17:25
  • I am trying to split the large json file which has multiple purchase orders in it into separate files, each one with one purchase order. doesn't the json_obj get the enumeration from the statement? – george fortech Feb 15 '19 at 17:28
  • @martineau the code is failing on that statement, so dont know what is obtained from it. The code is from some other post, where comments said the code worked, but does not work for me – george fortech Feb 15 '19 at 17:33
  • I meant when you wrote it, what were you ***expecting*** (or wanting) the result to look like—is it based on some data from the file, such as `PB510698` or `9cc06f21c1194038b137cec51b02606b`? – martineau Feb 15 '19 at 17:34
  • I didnt write it- it is a solution presented in another posting here- i just copied it and tried to run on my file – george fortech Feb 15 '19 at 17:42
  • Sign...OK, one last attempt: What do you ***want*** the filenames for each sub-document to be? – martineau Feb 15 '19 at 17:49
  • the value of the first tag would be great- the PB510698 thx – george fortech Feb 15 '19 at 18:06

1 Answers1

1

I think this will do what you want. The json_obj_list returned form json.load() is actually a Python dictionary, so you need it iterate the values in json_obj_list['data']. To keep the code sensible with respect to existing variable name(s), I modified it to just retrieve the JSON object list directly from the dictionary returned from json.load() like this:

json_obj_list = json.load(in_json_file)['data']

Here's the complete code:

import json


in_file_path = 'testfile.json'

with open(in_file_path,'r') as in_json_file:

    # Read the file and get the list from the dictionary.
    json_obj_list = json.load(in_json_file)['data']

    for json_obj in json_obj_list:
        filename = json_obj['number']+'.json'  # Changed this, too, per comment by OP.
        print('creating file:', filename)
        with open(filename, 'w') as out_json_file:
            # Save each obj to their respective filepath
            # with pretty formatting thanks to `indent=4`
            json.dump(json_obj, out_json_file, indent=4)
martineau
  • 119,623
  • 25
  • 170
  • 301
  • 1
    It ran clean, listed the 3 Docs in the file, created files for two of them- the files look good – george fortech Feb 15 '19 at 18:25
  • [oracle@localhost gk]$ python36 newsplit.py creating file: PB510698.json creating file: PB510699.json creating file: PB510698.json [oracle@localhost gk]$ ls PB*.json PB510698.json PB510699.json – george fortech Feb 15 '19 at 18:26