1

I have nested json that I would like to unpack into pandas dataframe, I can do it using the following code. Is there any way to modify the code to remove the global variable?

d = {
        "name":"Vertebrates",
        "children":[
        {
            "name":"Mammals",
            "children":[
            {
                "name":"human"
            },
            {
                "name":"chimpanzee"
            }
            ]
        },
        {
            "name":"Birds",
            "children":[
            {
                "name":"chicken"
            },
            {
                "name":"turkey"
            }
            ]
        }
        ]
    }

path = []

def unpack(d):
    global path
    if len(d) == 1:
        yield(d['name'], path)
    else:
        path.append(d['name'])
        for item in d['children']:
            yield from unpack(item)
        path = path[:-1]

pd.DataFrame.from_dict({key:value for key, value in unpack(d)},orient='index')

EDIT:

I actually started with path as a keyword argument, the issue was that I was getting this:

('human', ['Vertebrates', 'Mammals'])
('chimpanzee', ['Vertebrates', 'Mammals'])
('chicken', ['Vertebrates', 'Mammals', 'Birds'])
('turkey', ['Vertebrates', 'Mammals', 'Birds'])

where for chicken and turkey, path still has the word mammals, because the line: "path = path[:-1]" was useless in that code. so I decided to use a global variable to make sure I remove the last item whenever a branch in recursion finishes.

SOLVED: blhsing's answer can actually solve the problem, by removing the append function. bigwillydos's answer also does the trick.

I didn't know that in recursions variable updates are effective in a forward direction but ineffective in a backward direction. that's why I was getting accumulated path for later names.

3 Answers3

3

Make path an optional argument. It defaults to the empty list in the initial call, but you pass it explicitly in the recursive calls.

def unpack(d, path = None):
    if path is None:
        path = []
    if len(d) == 1:
        yield(d['name'], path)
    else:
        path.append(d['name'])
        for item in d['children']:
            yield from unpack(item, path)
        path = path[:-1]

Don't make the mistake of putting the default value in the parameter list; don't write:

def unpack(d, path = []):

See "Least Astonishment" and the Mutable Default Argument for an explanation.

Barmar
  • 741,623
  • 53
  • 500
  • 612
1

You can make path the second parameter instead with a default value of an empty tuple. You also don't need to append an item before a call only to remove the item after the call. The call stack of a recursive call will do that for you:

def unpack(d, path=()):
    if len(d) == 1:
        yield(d['name'], path)
    else:
        for item in d['children']:
            yield from unpack(item, path + (d['name'],))
blhsing
  • 91,368
  • 6
  • 71
  • 106
0

Make path a static variable for the unpack function

import pandas as pd

def static_vars(**kwargs):
    def decorate(func):
        for k in kwargs:
            setattr(func, k, kwargs[k])
        return func
    return decorate

@static_vars(path=[])
def unpack(d):
    if len(d) == 1:
        yield(d['name'], unpack.path)
    else:
        unpack.path.append(d['name'])
        for item in d['children']:
            yield from unpack(item)
        unpack.path = unpack.path[:-1]

def main():
    d = {
        "name":"Vertebrates",
        "children":[
        {
            "name":"Mammals",
            "children":[
            {
                "name":"human"
            },
            {
                "name":"chimpanzee"
            }
            ]
        },
        {
            "name":"Birds",
            "children":[
            {
                "name":"chicken"
            },
            {
                "name":"turkey"
            }
            ]
        }
        ]
    }

    df = pd.DataFrame.from_dict({key:value for key, value in unpack(d)},orient='index')

    print(df)

if __name__ == '__main__':
    main()
bigwillydos
  • 1,321
  • 1
  • 10
  • 15