0

I have a JSON file in the following format -

Note the characters after 1 and 2 (etc) represent strings written without double quotes

{
    "Apparel": {
        "XX": {
            "1": YY,
            "2": ZZ
                },
        "TT": {
            "1":TTT,
            "2":TTT,
            "3": TTT,
            "4": TTT
                    },
        "XXX": {
            "1":XXX,
            "2":XXX
                    },
        "RRR": {
            "1":RRR,
            "2":RRR
                    },

        "AAA": {
            "1":AAA,
            "2":AAA,
            "3":AAA
                    },
                }
....

And so on.

Now I know that the file is not correctly formatted (the file is being kept this way because of design or something idk) and using it with the standard json module in Python3 will give a decode error but I've been told to use the file as it is. Which means any problems, I'll have to sort in my code. I need to pick the values after 1 from every heading, then values from 2 from every heading and so on.

Currently I'm using this code to read the file -

import json

with open("brand_config.json") as json_file:
    json_data = json.load(json_file)
    test = (json_data["apparel"]["biba"])

print (test)

This code gives this error -

Traceback (most recent call last):
  File "reader.py", line 4, in <module>
    json_data = json.load(json_file)
  File "/usr/lib/python3.5/json/__init__.py", line 268, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 4 column 9 (char 36)

How do I read the required values without changing anything in the JSON file.

YaddyVirus
  • 301
  • 4
  • 21

3 Answers3

1

I understand from the question that the values of your JSON are not surrounded by quotation marks.

I wrote the following script that parses that specific file from the question:

#!/usr/bin/env python3

from json import dumps


# Reads THAT SPECIFIC MALFORMATTED JSON, SHOULD NOT BE USED
def parse_json(filename):
    j = {}
    with open(filename, 'r') as json_file:
    lines = [line.strip() for line in json_file.readlines()]

    level = 0
    keys = []
    for line in lines:
        # increase a level
        if '{' in line:
            level += 1
            # append proper key
            if ':' in line:
                keys.append(line.split(':')[0].replace('"', '').strip())
                if level == 2:
                    j[keys[0]] = {}
                elif level == 3:
                    j[keys[0]][keys[1]] = {}
        # decrease a level, remove key
        elif '}' in line:
            keys = keys[:-1]
            level -= 1
        # add value
        else:
            if level == 3 and line:
                k, v = line.split(':')
                k = k.replace('"', '').strip()
                v = v.strip()[:-1]
                j[keys[0]][keys[1]][k] = v
    return j


brand_config = parse_json('brand_config.json')
print(dumps(brand_config, indent=4, sort_keys=True))

Which creates a python dictionary:

{
    "Apparel": {
        "AAA": {
            "1": "AAA",
            "2": "AAA",
            "3": "AA"
        },
        "RRR": {
            "1": "RRR",
            "2": "RR"
        },
        "TT": {
            "1": "TTT",
            "2": "TTT",
            "3": "TTT",
            "4": "TT"
        },
        "XX": {
            "1": "YY",
            "2": "Z"
        },
        "XXX": {
            "1": "XXX",
            "2": "XX"
        }
    }
}

Given what you provided in the question.

EDIT: explanation asked for in the comments

keys is a list used to store the keys that are currently being used in the json. For example, { "Apparel": {}} will mean keys=["Apparel"], and { "Apparel": {"AAA": XXX }} will mean keys=["Apparel", "AAA"].

The function processes the text file one line at a time

  • Create an empty dictionary (j).

  • Whenever {, level is increased by 1. If : was present in the line, split it and use the first string as a dictionary key after removing the quotation marks. Create a new dictionary associated with that key.

  • If no { is present but : is, split the line and use left value as key, right value as value.

  • If } is present, decrease level by 1 and remove the last key.

The final line just prettyprints it.

mikelsr
  • 457
  • 6
  • 13
  • So what's happening here is that this script is converting the spaces before and after the string to quotes? – YaddyVirus Mar 28 '18 at 03:46
  • Also, assuming I have more than 1 JSONs in the same format, how can I extract corresponding values from them and concatenate them to form a single string? – YaddyVirus Mar 28 '18 at 03:48
  • For the concatenating question I'll need more details, but it'd work the same way you do it with an standard python dictionary (which it is). – mikelsr Mar 30 '18 at 12:22
  • What I meant was, that I have a total of 3 JSONs like this. I want to pick each value of `"1":` from each file and concatenate them into one string. Then each value of `"2":` and so on. How do I do that? – YaddyVirus Mar 31 '18 at 07:07
  • Also I noticed a strange bug, when iterating through a key with a lot of entries (some have as much as 30) the final output has its order messed up. Like 1 will be followed by 28 and so on in an apparent random order – YaddyVirus Mar 31 '18 at 07:10
  • Python dictionaries have no order, check [this question](https://stackoverflow.com/questions/9001509/how-can-i-sort-a-dictionary-by-key#9001529) for more info – mikelsr Mar 31 '18 at 10:09
0

there must be a problem in your json file. I do not know where you have got this file from or if you have generated it by yourself.

your commands are right, I did the same case and it works...

>>> import json
>>> obj = {5:'jul'}
>>> d = json.dump(obj,open(os.getcwd()+'/JS.sjon','w+'))
>>> ld = json.load(open(os.getcwd()+'/JS.sjon','r'))
{'5': 'jul'}
>>> ld.get('5')
'jul'

enter image description here

So, review the source of your data

Julio CamPlaz
  • 857
  • 8
  • 18
0

YAML is a superset of JSON that is less strict when it comes to quotes around values (see package pyaml).

E.g.

import yaml

with open("brand_config.json") as json_file:
    json_data = yaml.safe_load(json_file)
    test = (json_data["apparel"]["biba"])

print (test)

I tried this approach with the "bad" JSON snippet from your question, and here is what I got:

{'Apparel': {'AAA': {'1': 'AAA', '2': 'AAA', '3': 'AAA'},
  'RRR': {'1': 'RRR', '2': 'RRR'},
  'TT': {'1': 'TTT', '2': 'TTT', '3': 'TTT', '4': 'TTT'},
  'XX': {'1': 'YY', '2': 'ZZ'},
  'XXX': {'1': 'XXX', '2': 'XXX'}}}
0x416e746f6e
  • 9,872
  • 5
  • 40
  • 68
  • I got this error while running your script - `yaml.scanner.ScannerError: while scanning for the next token found character '\t' that cannot start any token in "brand_config.json", line 2, column 1` – YaddyVirus Mar 31 '18 at 07:05