1

I am needing to take a highly nested json file (i.e. Elasticsearch mapping for an index) and produce a list of items.
Example Elasticsearch Mapping:

{
    "mappings": {
        "properties": {
            "class": {
                "properties": {
                    "name": {
                        "properties": {
                            "firstname": {
                                "type": "text"
                            },
                            "lastname": {
                                "type": "text"
                            }
                        }
                    },
                    "age": {
                        "type": "text "
                    }
                }
            }
        }
    }
}

Example Desired Result:

["mappings.properties.class.properties.name.properties.firstname",
 "mappings.properties.class.properties.name.properties.lastname",
 "mappings.properties.class.properties.age"]

I pandas.json_normalize() doesn't quite do what I want. Neither does glom()

Jennifer Crosby
  • 185
  • 1
  • 1
  • 14
  • this package does what you're asking, but the format is different, so you'd have to get keys and then replace _ with "." https://pypi.org/project/flatten-json/ it also doesn't include leaves, so after you replace the _ in keys, you'll want to add ".{value} for the value. – smcrowley Jul 08 '22 at 20:46

1 Answers1

4

You should be able to make a fairly short recursive generator to do this. I'm assuming you want all the keys until you see a dict with type in it:

d = {
    "mappings": {
        "properties": {
            "class": {
                "properties": {
                    "name": {
                        "properties": {
                            "firstname": {
                                "type": "text"
                            },
                            "lastname": {
                                "type": "text"
                            }
                        }
                    },
                    "age": {
                        "type": "text "
                    }
                }
            }
        }
    }
}

def all_keys(d, path=None):
    if path is None:
        path = []
    if not isinstance(d, dict) or 'type' in d:
        yield '.'.join(path)
        return
    for k, v in d.items():
        yield from all_keys(v, path + [k])

list(all_keys(d))

Which gives:

['mappings.properties.class.properties.name.properties.firstname',
 'mappings.properties.class.properties.name.properties.lastname',
 'mappings.properties.class.properties.age']
Mark
  • 90,562
  • 7
  • 108
  • 148