0

I want to get "path" from the below json file; I used json.load to get read json file and then parse one by one using for key, value in data.items() and it leads to lot of for loop (Say 6 loops) to get to the value of "path"; Is there any simple method to retrieve the value of path?

The complete json file can be found here and below is the snippet of it.

{
"products": {
    "com.ubuntu.juju:12.04:amd64": {
        "version": "2.0.1",
        "arch": "amd64",
        "versions": {
            "20161129": {
                "items": {
                    "2.0.1-precise-amd64": {
                        "release": "precise",
                        "version": "2.0.1",
                        "arch": "amd64",
                        "size": 23525972,
                        "path": "released/juju-2.0.1-precise-amd64.tgz",
                        "ftype": "tar.gz",
                        "sha256": "f548ac7b2a81d15f066674365657d3681e3d46bf797263c02e883335d24b5cda"
                    }
                }
            }
        }
    },
    "com.ubuntu.juju:14.04:amd64": {
        "version": "2.0.1",
        "arch": "amd64",
        "versions": {
            "20161129": {
                "items": {
                    "2.0.1-trusty-amd64": {
                        "release": "trusty",
                        "version": "2.0.1",
                        "arch": "amd64",
                        "size": 23526508,
                        "path": "released/juju-2.0.1-trusty-amd64.tgz",
                        "ftype": "tar.gz",
                        "sha256": "7b86875234477e7a59813bc2076a7c1b5f1d693b8e1f2691cca6643a2b0dc0a2"
                    }
                }
            }
        }
    },
iehrlich
  • 3,572
  • 4
  • 34
  • 43
Viswesn
  • 4,674
  • 2
  • 28
  • 45
  • 1
    Hmm, what is the expected result? Value of first *path* field, values for all path fields, a map where the path field is the value and the key is ... ? There may be more direct ways than the Json loader, but it really depends on what you want. – Serge Ballesta Nov 29 '16 at 17:00
  • 1
    I don't see how this has anything to do with parsing JSON -- I mean, if your `json.load()` succeeds, then what you have is a Python data structure, not JSON content at all; the fact that the data *used to be* JSON has nothing to do with the question: Any answer (even the JSONPath answer) would still work even if what you had originated as a Python data structure. – Charles Duffy Nov 29 '16 at 17:05
  • @SergeBallesta I need to get sha256 value for the file 'juju-2.0.1-trusty-amd64.tgz' which is defined in the path. The input file may differ at run time and we need to get corresponding sha256. – Viswesn Nov 29 '16 at 17:36
  • If you do not say what you want exactly, we won't be able to help you... The file in paste.ubuntu.com contains many `path` fields with different names. Do you want only one or all, and do you want any associated other field? – Serge Ballesta Nov 29 '16 at 21:44

5 Answers5

1

You can use recursive generator:

def get_paths(data):
    if 'path' in data:
        yield data['path']
    for k in data.keys():
        if isinstance(data[k], dict):
            for i in get_paths(data[k]):
                yield i


for path in get_paths(json_data): # loaded json data
    print(path)
Yevhen Kuzmovych
  • 10,940
  • 7
  • 28
  • 48
0

Is path key always at the same depth in the loaded json (which is a dict so) ? If so, what about doing

products = loaded_json['products']
for product in products.items():
    print product[1].items()[2][1].items()[0][1].items()[0][1].items()[0][1]['path']

If not, the answer of Yevhen Kuzmovych is clearly better, cleaner and more general than mine.

keepAlive
  • 6,369
  • 5
  • 24
  • 39
0

If you only care about the path, I think using any JSON parser is an overkill, you can just use built in re regex and use the following pattern (\"path\":\s*\")(.*\s*)(?=\",). I didn't test the whole file but should be able to figure out the best pattern fairly easily.

evertqin
  • 70
  • 5
0

If you only need the file names present in path field, you can easily get them by simply parsing the file:

import re

files = []
pathre = re.compile(r'\s*"path"\s*:\s*"(.*?)"')
with open('file.json') as fd:
    for line in fd:
        if "path" in line:
            m = pathre.match(line)
            if m is not None:
                files.append(m.group(1))

If you need to process simultaneously the path and sha256 fields:

files = []
pathre = re.compile(r'\s*"path"\s*:\s*"(.*?)"')
share = re.compile(r'\s*"sha256"\s*:\s*"(.*?)"')
path = None
with open('file.json') as fd:
    for line in fd:
        if "path" in line:
            m = pathre.match(line)
            path = m.group(1)
        elif "sha256" in line:
            m = share.match(line)
            if path is not None:
                files.append((path, m.group(1)))
                path = None
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
-1

You can use a query language like JSONPath. Here you find the Python implementation: https://pypi.python.org/pypi/jsonpath-rw

Assuming you have your JSON content already loaded, you can do something like the following:

from jsonpath_rw import jsonpath, parse

# Load your JSON content first from a file or from a string
# json_data = ...

jsonpath_expr = parse('products..path')
for match in jsonpath_expr.find(json_data):
    print(match.value)

For a further discussion you can read this: Is there a query language for JSON?

Community
  • 1
  • 1
narko
  • 3,645
  • 1
  • 28
  • 33