0

I have the below JSON file which I'm getting from a API.

{"Key-1":"Value-1",
"Key-2":[{"Value-2"::Child_Value-1","Value-3":"Child_Value-2"}]
}
{"Key-21":"Value-21",
"Key-22":[{"Value-22":"Child_Value-21","Value-23":"Child_Value-22"}]
}
{"Key-31":"Value-31",
"Key-32":[{"Value-32":"Child_Value-31","Value-33":"Child_Value-32"}]
}

I understand that this does not qualify as JSON format, but what I'm trying to achieve is that I want to extract each of the individual objects and store them in a separate file.

For instance file1.json should contain -

[{"Key-1":"Value-1",
    "Key-2":[{"Value-2":"Child_Value-1","Value-3":"Child_Value-2"}]
    }]

and file2.json should contain -

[{"Key-21":"Value-21",
    "Key-22":[{"Value-22":"Child_Value-21","Value-23":"Child_Value-22"}]
    }]

I'm trying to do this through python and shell script, but its not leading me anywhere. Is there a good library in python/shell that'll help. I'm kind of constrained on the language to be used (python,shell-script)

FirstName
  • 377
  • 2
  • 6
  • 21
  • 2
    As far as I'm aware there isn't a library for parsing broken JSON (missing quotes, not a single root array/object, ...). – jonrsharpe Jul 05 '16 at 11:14
  • If the JSON is well formed you will find the [JSON module](https://docs.python.org/2/library/json.html) for Python very useful. Plus, I would forget about shell-script... – kazbeel Jul 05 '16 at 11:16
  • You need some method for finding the boundaries between the individual bits of JSON. Is it always 3 lines per JSON? That would be ideal. – RemcoGerlich Jul 05 '16 at 11:17
  • @RemcoGerlich its not always 3 lines per json. But is there any library that picks up data between 2 brackets { }. In that case it would work for me as that is what I need – FirstName Jul 05 '16 at 11:20
  • Have you read any or all of http://stackoverflow.com/q/20400818/3001761, http://stackoverflow.com/q/27907633/3001761, http://stackoverflow.com/q/6886283/3001761, http://stackoverflow.com/q/8730119/3001761, ... – jonrsharpe Jul 05 '16 at 11:27
  • @FirstName: the problem is that you have further brackets inside the brackets, so simple things aren't going to work. I'm not aware of a library for this specific sort of thing. – RemcoGerlich Jul 05 '16 at 11:29
  • You are getting closer and closer to the actual input you have and output you want in your editing but you still have more to do. Look for '::' in your source for another area needing fixing. – jwpfox Jul 05 '16 at 11:46

3 Answers3

1

Here's something that will be very slow and not equipped to deal with errors in the data, but it might work. It's a generator that finds the first '{', and then the next '}', and tries to parse the bit in between as JSON. If that fails, it looks for the next '}' and tries again. It yields the successfully parsed bits.

import json

def generate_json_dictionaries(s):
    opening = s.find('{')
    while opening != -1:
        possible_closing = opening
        while True:
            possible_closing = s.find('}', start=possible_closing+1)
            if possible_closing == -1: return  # Data incomplete
            try:
                j = json.loads(s[opening:possible_closing+1])
                yield j
                break
            except ValueError:
                pass
        opening = s.find('{', start=possible_closing+1)  # Next start

Not tested.

RemcoGerlich
  • 30,470
  • 6
  • 61
  • 79
0

This does exactly what your question asks for (although I suspect it is not actually what you want)

filecount = 0
newfilecontents = ''

with open('junksrc.txt', mode='r', encoding='utf-8') as src:
    srclines = src.readlines()
    for line in srclines:
        if '{"Key' in line:
            newfilecontents = '[' + line
        if '}]' in line:
            newfilecontents = newfilecontents + '    ' + line + '    }]\n'
            filecount += 1
            filename = 'junkdest' + str(filecount) + '.json'
            with open(filename, mode='w', encoding='utf-8') as dest:
                dest.write(newfilecontents)
jwpfox
  • 5,124
  • 11
  • 45
  • 42
0

If you get jq, you can preprocess your data into a form which is easily parsed by the standard library's JSON parser:

$ jq -s '.' tmp.json
[
  {
    "Key-1": "Value-1",
    "Key-2": [
      {
        "Value-2": "Child_Value-1",
        "Value-3": "Child_Value-2"
      }
    ]
  },
  {
    "Key-21": "Value-21",
    "Key-22": [
      {
        "Value-22": "Child_Value-21",
        "Value-23": "Child_Value-22"
      }
    ]
  },
  {
    "Key-31": "Value-31",
    "Key-32": [
      {
        "Value-32": "Child_Value-31",
        "Value-33": "Child_Value-32"
      }
    ]
  }
]

jq can recognize a stream of valid top-level objects, as you have here. The -s option tells jq to put them all in a single top-level array before further processing.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • Thats helpful. thanks! is there a way to give the array that you created with jq command a name ? Does jq have some additional functionality for such operation ? – FirstName Jul 05 '16 at 16:00
  • I'm not sure what you mean by "give [it] a name". One way of using it is to pipe it into your Python script, and use `json.load` to read from standard input: `jq -s . tmp.json | python -c 'import sys,json; x = json.load(sys.stdin); ...'` – chepner Jul 05 '16 at 16:04
  • what I meant by naming the array is the array that you've created with jq -s command - "single top-level array". The example which you've provided created a top level array without a name. I want to know if there an option to create with name. – FirstName Jul 05 '16 at 16:17
  • Arrays don't have names in JSON, and the snippet in my previous comment assigns a *Python* name to the parsed string. Do you want a shell parameter? `x=$(jq -s '.' tmp.json)` – chepner Jul 05 '16 at 16:25