5

We have a YAML file which looks somewhat like the following:

all:
  children:
    allnetxsites:
      children:
        netxsites:
          hosts:
            bar.:
              ansible_ssh_host: bart.j
              domain: bart.local.domain
              nfs: lars.local.domain

How would I go about getting the value bar. and the value for the key nfs?

Python Code:

import yaml
with open("/Users/brendan_vandercar/sites.yaml", 'r') as stream:
    data_loaded = yaml.load(stream)

for element in data_loaded:
    name = "element"['all']['children']['allnetxsites']['children']['netxsites']['hosts']['bart']['nfs'][0]
    print(name)

What I would like to get is a list output from this script that has the below:

Domain: bart.local.domain
NFS: lars.local.domain
Anthon
  • 69,918
  • 32
  • 186
  • 246
  • Related: https://stackoverflow.com/questions/7320319/xpath-like-query-for-nested-python-dictionaries – Esteis Oct 20 '22 at 13:24

2 Answers2

4

Your title makes it look like you are a bit confused about what is going on, or at least about terminology: although "YAML data structure" might be construed as shorthand for "Python data structure loaded from a YAML document", you do not further parse that data structure. Any parsing is done as part of the loading of the YAML document and parsing is completely finished even before yaml.load() returns. As a result of that loading you have a data structure in Python and you "just" need to lookup a key in a nested Python data-structure by recursively walking that data structure.


Your YAML example is somewhat uninteresting, as it only represents a tiny subset of real YAML as your YAML only consists of (plain) scalars that are strings, mappings, and mapping keys that are scalars.

To walk over that data structure a simplified version of the recursive function @aaaaaa presented will do:

import sys
import yaml

yaml_str = """\
all:
  children:
    allnetxsites:
      children:
        netxsites:
          hosts:
            bar.:
              ansible_ssh_host: bart.j
              domain: bart.local.domain
              nfs: lars.local.domain
"""

data = yaml.safe_load(yaml_str)

def find(key, dictionary):
    # everything is a dict
    for k, v in dictionary.items():
        if k == key:
            yield v
        elif isinstance(v, dict):
            for result in find(key, v):
                yield result

for x in find("nfs", data):
    print(x)

which prints the expected:

lars.local.domain

I have simplified the function find because the list handling in the version in the snippet is incorrect.

Although the kinds of scalars used do not affect the recursive lookup, you probably want a more generic solution that can handle YAML with (nested) sequences, tagged nodes and complex mapping keys as well.

Assuming your input file to be the slightly more complex input.yaml:

all:
  {a: x}: !xyz
  - [k, l, 0943]
  children:
    allnetxsites:
      children:
        netxsites:
          hosts:
            bar.:
              ansible_ssh_host: bart.j
              domain: bart.local.domain
              nfs: lars.local.domain

You can use ruamel.yaml (disclaimer: I am the author of that package) to do:

import sys
from pathlib import Path
import ruamel.yaml

in_file = Path('input.yaml')

yaml = ruamel.yaml.YAML()
data = yaml.load(in_file)

def lookup(sk, d, path=[]):
   # lookup the values for key(s) sk return as list the tuple (path to the value, value)
   if isinstance(d, dict):
       for k, v in d.items():
           if k == sk:
               yield (path + [k], v)
           for res in lookup(sk, v, path + [k]):
               yield res
   elif isinstance(d, list):
       for item in d:
           for res in lookup(sk, item, path + [item]):
               yield res

for path, value in lookup("nfs", data):
    print(path, '->', value)

which gives:

['all', 'children', 'allnetxsites', 'children', 'netxsites', 'hosts', 'bar.', 'nfs'] -> lars.local.domain

As PyYAML only parses a subset of YAML 1.1 and loads even less of that, it cannot handle the valid YAML in input.yaml.

The abovementioned snippet, the one @aaaaa is using, is will break on the loaded YAML because of the (directly) nested sequences/lists

Anthon
  • 69,918
  • 32
  • 186
  • 246
1

Maybe this snippet would provide you some help

def find(key, dictionary):
    for k, v in dictionary.iteritems():
        if k == key:
            yield v
        elif isinstance(v, dict):
            for result in find(key, v):
                yield result
        elif isinstance(v, list):
            for d in v:
                for result in find(key, d):
                    yield result

Then your code is equivalent to

find('nfs', data_loaded)
  • 1
    `.iteritems()` is only available in Python 2 and that is end-of-life next year, use `.items()` instead. Much more problematic is the second class handling of lists, this `find` breaks on directly nested lists, it looks like list was added as an afterthought and incorrectly at that. – Anthon Apr 10 '19 at 09:04