2

I'm reciving answer from API in json:

    "files":[
    {
      "name":"main",
      "node_type":"directory",
      "files":[
        {
          "name":"source1",
          "node_type":"directory",
          "files":[
            {
              "name":"letters",
              "node_type":"directory",
              "files":[
                {
                  "name":"messages.po",
                  "node_type":"file",
                  "created":"2014-08-14 08:51:41",
                  "last_updated":"2014-08-14 08:51:42",
                  "last_accessed":"0000-00-00 00:00:00"
                }
              ]
            }
          ]
        },
        {
          "name":"source2",
          "node_type":"directory",
          "files":[

          ]
        }
      ]
    },
    {
      "name":"New Directory",
      "node_type":"directory",
      "files":[
        {
          "name":"prefs.js",
          "node_type":"file",
          "created":"2014-08-14 08:11:53",
          "last_updated":"2014-08-14 08:11:53",
          "last_accessed":"0000-00-00 00:00:00"
        }
      ]
    },
    {
      "name":"111",
      "node_type":"directory",
      "files":[
        {
          "name":"222",
          "node_type":"directory",
          "files":[
            {
              "name":"333",
              "node_type":"directory",
              "files":[
                {
                  "name":"cli.mo",
                  "node_type":"file",
                  "created":"2014-08-14 08:51:30",
                  "last_updated":"2014-08-14 08:51:30",
                  "last_accessed":"0000-00-00 00:00:00"
                }
              ]
            }
          ]
        }
      ]
    }
  ],

The project structure is:

├──111──222──333───cli.mo
├──main──source1──letters───messages.po
         └──source2
├──New Directory──prefs.js

How to parse json, so I can recive in return something like this:

/111/222/333/cli.mo
/main/source1/letters/messages.po
/main/source2/
/New Directory/prefs.js

I tried to write down some code in Python, but I'm a beginner and my attempts failed.

Comix
  • 113
  • 4
  • 7

3 Answers3

3

If you're looking to actually receive the strings back, I suggest using generators:

def parse(data, parent=''):
    if data is None or not len(data):
        yield parent
    else:
        for node in data:
            for result in parse(
                    node.get('files'), parent + '/' + node.get('name')):
                yield result

You can also use a variant on the yield parent statement to have /main/source2 be returned with a trailing slash (/main/source2/), though I find it too verbose:

        yield parent + ('/' if data is not None and not len(data) else '')

Pass your JSON-parsed list to the parse function above, and you'll receive back an iterator that will provide you with the strings it finds in the data:

import json

# shamelessly ignoring PEP8 for the sake of space
data = '''
[{"files": [{"files": [{"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:42",
"name": "messages.po", "created": "2014-08-14 08:51:41"}], "node_type": "directory", "name": "letters"}], "node_type": "directory",
"name": "source1"}, {"files": [], "node_type": "directory", "name": "source2"}], "node_type": "directory", "name": "main"}, {"files":
[{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:11:53", "name": "prefs.js", "created":
"2014-08-14 08:11:53"}], "node_type": "directory", "name": "New Directory"}, {"files": [{"files": [{"files": [{"node_type": "file",
"last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:30", "name": "cli.mo", "created": "2014-08-14 08:51:30"}],
"node_type": "directory", "name": "333"}], "node_type": "directory", "name": "222"}], "node_type": "directory", "name": "111"}]
'''

for item in parse(json.loads(data)):
    print item

Running the above will give you

/main/source1/letters/messages.po
/main/source2
/New Directory/prefs.js
/111/222/333/cli.mo

as output. There's a very interesting read about generators here at SO: What does the "yield" keyword do in Python? - I suggest going through all of the answers.

Community
  • 1
  • 1
planestepper
  • 3,277
  • 26
  • 38
  • I like this solution. Clean, pure recursion. One downside is that it emits results in traversal order; wrapping in an object allows more general pre- and post-processing. Still, it's short and sweet. – Jonathan Eunice Aug 15 '14 at 13:02
  • I believe you are missing an `or` in line 2. `if data is None not len(data):` isn't valid. – Jonathan Eunice Aug 15 '14 at 13:02
  • @JonathanEunice it's short, yes, and that was my intention - but it isn't explicit - but as _Beautiful is better than ugly_ comes before _Explicit is better than implicit_, I guess it's fine – planestepper Aug 15 '14 at 14:04
1

What you need is a recursive descent parser. The json module can do a lot of the heavy lifting of parsing JSON syntax, but you still need to traverse the resulting data structure and interpret it. Recursion is called for because you don't know how many layers or levels of directory structures you will encounter.

jdata = """
[{"files": [{"files": [{"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:42", "name": "messages.po",
"created": "2014-08-14 08:51:41"}], "node_type": "directory", "name": "letters"}], "node_type": "directory", "name": "source1"}, {"files": [], "node_type":
"directory", "name": "source2"}], "node_type": "directory", "name": "main"}, {"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:11:53", "name": "prefs.js", "created": "2014-08-14 08:11:53"}], "node_type": "directory", "name": "New Directory"}, {"files": [{"files": [
{"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:30", "name": "cli.mo", "created": "2014-08-14 08:51:30"}], "node_type": "directory", "name": "333"}], "node_type": "directory", "name": "222"}], "node_type": "directory", "name": "111"}]
"""

import json
import os
import sys

if sys.version_info[0] > 2:
    unicode = str

class Filepaths(object):

    def __init__(self, data):
        """
        Discover file paths in the given data. If the data is JSON string,
        decode it. If already decoded into Python structures, use it directly.
        """
        self.paths = []
        if isinstance(data, (str, unicode)):
            data = json.loads(data)
        self.traverse(data)
        self.paths = reversed(self.paths)

    def traverse(self, n, prefix="/"):
        """
        Traverse the data tree. On terminal nodes, add files and directories
        found to self.paths
        """
        if isinstance(n, list):
            for item in n:
                self.traverse(item, prefix)
        elif isinstance(n, dict):
            nodetype = n['node_type']
            nodename = n['name']
            if nodetype == 'directory':
                files = n['files']
                if files:
                    for f in files:
                        self.traverse(f, os.path.join(prefix, nodename))
                else:
                    self.paths.append(os.path.join(prefix, nodename) + '/')
            elif nodetype == 'file':
                self.paths.append(os.path.join(prefix, nodename))
            else:
                raise ValueError("didn't understand node named {0!r}, type {1!r}".format(nodename, nodetype))
        else:
            raise ValueError("didn't understand node {0!r}".format(n))

p = Filepaths(jdata)
for path in p.paths:
    print path

This results in:

/111/222/333/cli.mo
/New Directory/prefs.js
/main/source2/
/main/source1/letters/messages.po

Note that I used a class rather than just a recursive function to get around Python's onerous rules for global variables. Sure, I could have declared a global variable paths and noted it as global in the function, but that is messy. Objects are the standard Python way to "package together" routines and the data that they need to access. Recursive traversal often works better as an object in Python.

Jonathan Eunice
  • 21,653
  • 6
  • 75
  • 77
0

I think the best way of going about this is the same way ls -R in Unix and os.walk() in Python do: recursively. For example, to list all the files including directories, you could do something like this:

def walk(tree, path):
  dirs = []
  for f in tree:
    print(path + '/' + f['name'])
    if f['node_type']=='directory':
      dirs.append(f['files'])

  for subtree in dirs:
    walk(subtree, path+'/'+f['name'])
whereswalden
  • 4,819
  • 3
  • 27
  • 41