6

In searching around the web for usages of custom constructors I see things like this:

def some_constructor(loader, node):
    value = loader.construct_mapping(node, deep=True)
    return SomeClass(value)

What does the deep=True do? I don't see it in the pyyaml documentation.

It looks like I need it; I have a yaml file generated by a pyyaml representer and it includes node anchors and aliases (like &id003 and *id003); without deep=True I get a shallow map back for those objects containing anchors/aliases.

Jason S
  • 184,598
  • 164
  • 608
  • 970

2 Answers2

4

That you don't see deep=True in the documentation is because you don't normally need to use it as an end-user of the PyYAML package.

If you trace the use of methods in constructor.py that use deep= you come to construct_mapping() and construct_sequence() in class BaseConstructor() and both of these call BaseConstructor.construct_object(). The relevant code in that method to study is:

    if tag_suffix is None:
        data = constructor(self, node)
    else:
        data = constructor(self, tag_suffix, node)
    if isinstance(data, types.GeneratorType):
        generator = data
        data = next(generator)
        if self.deep_construct:
            for dummy in generator:
                pass
        else:
            self.state_generators.append(generator)

and in particular the for loop in there, which only gets executed if deep=True was passed in.

Rougly said if the data comes from a constructor is a generator, then it walks over that data (in the for loop) until the generator is exhausted. With that mechanism, those constructors can contain a yield to create a base object, of which the details can be filled out after the yield. Because of their being only one yield in such constructors, e.g. for mappings (constructed as Python dicts):

def construct_yaml_map(self, node):
    data = {}
    yield data
    value = self.construct_mapping(node)
    data.update(value)

I call this a two step process (one step to the yield the next to the end of the method.

In such two-step constructors the data to be yielded is constructed empty, yielded and then filled out. And that has to be so because of what you already noticed: recursion. If there is a self reference to data somewhere underneath, data cannot be constructed after all its children are constructed, because it would have to wait for itself to be constructed.

The deep parameter indirectly controls whether objects that are potentially generators are recursively being built or appended to the list self.state_generators to be resolved later on.

Constructing a YAML document then boils down to constructing the top-level objects and looping over the potentially recursive objects in self.state_generators until no generators are left (a process that might take more than one pass).

Anthon
  • 69,918
  • 32
  • 186
  • 246
  • 1
    so if I have recursive data structures, I can't use it (makes sense); why would I have to use it, though? In other words, is there something I can look for in my data that would explain when I need to use `deep=True` ? – Jason S May 05 '17 at 20:10
  • 2
    If you dump a data structure that has the same object/mapping/sequence attached at multiple positions you'll get anchors and aliases in your YAML and then you need `deep=True` to be able to load those. If you dump data that at some point has an object that "underneath" itself has a self reference, you will get anchors and aliases as well, but you'll need `deep=True` **and** the two-step process provided by `yield` to be able to load that YAML. So I always make constructors for non-scalars (the potential (self)-recursive ones) with `yield` and `deep=True` although not needed by some YAML docs. – Anthon May 05 '17 at 20:20
0

The deep argument controls how nested dictionaries are handled during this process.

When deep=True, the construct_mapping method will recursively call itself on any nested dictionaries it encounters, and merge the resulting dictionaries together.

for example:

a:
  b: 1
  c: 2
d:
  b: 3

When "deep=True"

{'a': {'b': 1, 'c': 2}, 'd': {'b': 3}}

When "deep=False"

{'a': {'c': 2}, 'd': {'b': 3}}
Ben Hazout
  • 66
  • 5