4

I want to have a base config file which is used by other config files to share common config.

E.g if I have one file base.yml with

foo: 1

bar:
  - 2
  - 3

And then a second file some_file.yml with

foo: 2

baz: "baz"

What I'd want to end up with a merged config file with

foo: 2

bar:
  - 2
  - 3

baz: "baz"

It's easy enough to write a custom loader that handles an !include tag.

class ConfigLoader(yaml.SafeLoader):

    def __init__(self, stream):
        super().__init__(stream)
        self._base = Path(stream.name).parent

    def include(self, node):
        file_name = self.construct_scalar(node)
        file_path = self._base.joinpath(file_name)

        with file_path.open("rt") as fh:
            return yaml.load(fh, IncludeLoader)

Then I can parse an !include tag. So if my file is

inherit:
   !include base.yml

foo: 2

baz: "baz"

But now the base config is a mapping. I.e. if I load the the file I'll end up with

config = {'a': [42], 'c': [3.6, [1, 2, 3]], 'include': [{'a': 1, 'b': [1.43, 543.55]}]}

But if I don't make the tag part of a mapping, e.g.

!include base.yml

foo: 2

baz: "baz"

I get an error. yaml.scanner.ScannerError: mapping values are not allowed here.

But I know that the yaml parser can parse tags without needing a mapping. Because I can do things like

!!python/object:foo.Bar
x: 1.0   
y: 3.14

So how do I write a loader and/or structure my YAML file so that I can include another file in my configuration?

Anthon
  • 69,918
  • 32
  • 186
  • 246
Gree Tree Python
  • 529
  • 1
  • 6
  • 22
  • 1
    The [recommended file extension](http://yaml.org/faq.html) for *YAML* files has been `.yaml` since 2006, when are you going to catch up? The [YML](https://fdik.org/yml/) format is XML based, and at least as old as YAML. – Anthon Feb 11 '22 at 06:28
  • why do you invent a yaml grammer, i thought there could be a third yaml file defining which files should be mreged and to what file. – Lei Yang Feb 11 '22 at 06:30
  • Can you update your question with the full minimal working program generating the `config = ` line? Or at least Include the definition of `IncludeLoader`. – Anthon Feb 11 '22 at 07:31

2 Answers2

3

In YAML you cannot mix scalars, mapping keys and sequence elements. This is invalid YAML:

- abc
d: e

and so is this

some_file_name
a: b

and that you have that scalar quoted, and provide a tag does of course not change the fact that it is invalid YAML.

As you can already found out, you can trick the loader into returning a dict instead of the string (just like the parser already has built in constructors for non-primitive types like datetime.date).

That this:

!!python/object:foo.Bar
x: 1.0
y: 3.14

works is because the whole mapping is tagged, where you just tag a scalar value.

What also would be invalid syntax:

!include base.yaml
foo: 2
baz: baz

but you could do:

!include
filename: base.yaml
foo: 2
baz: baz

and process the 'filename' key in a special way, or make the !include tag an empty key:

!include : base.yaml  # : is a valid tag character, so you need the space
foo: 2
baz: baz

I would however look at using merge keys, as merging is essentially what you are trying to do. The following YAML works:

import sys
import ruamel.yaml
from pathlib import Path

yaml_str = """
<<: {x: 42, y: 196, foo: 3}
foo: 2
baz: baz
"""
yaml = ruamel.yaml.YAML(typ='safe')
yaml.default_flow_style = False
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)

which gives:

baz: baz
foo: 2
x: 42
y: 196

So you should be able to do:

<<: !load base.yaml
foo: 2
baz: baz

and anyone with knowledge of merge keys would know what happens if base.yaml does include the key foo with value 3, and would also understand:

<<: [!load base.yaml, !load config.yaml]
foo: 2
baz: baz

(As I tend to associate "including" with textual including as in the C preprocessor, I think `!load' might be a more appropriate tag, but that is probably a matter of taste).

To get the merge keys to work, it is probably easiest to just sublass the Constructor, as merging is done before tag resolving:

import sys
import ruamel.yaml
from ruamel.yaml.nodes import MappingNode, SequenceNode, ScalarNode
from ruamel.yaml.constructor import ConstructorError
from ruamel.yaml.compat import _F
from pathlib import Path



class MyConstructor(ruamel.yaml.constructor.SafeConstructor):
    def flatten_mapping(self, node):
        # type: (Any) -> Any
        """
        This implements the merge key feature http://yaml.org/type/merge.html
        by inserting keys from the merge dict/list of dicts if not yet
        available in this node
        """
        merge = []  # type: List[Any]
        index = 0
        while index < len(node.value):
            key_node, value_node = node.value[index]
            if key_node.tag == 'tag:yaml.org,2002:merge':
                if merge:  # double << key
                    if self.allow_duplicate_keys:
                        del node.value[index]
                        index += 1
                        continue
                    args = [
                        'while constructing a mapping',
                        node.start_mark,
                        'found duplicate key "{}"'.format(key_node.value),
                        key_node.start_mark,
                        """
                        To suppress this check see:
                           http://yaml.readthedocs.io/en/latest/api.html#duplicate-keys
                        """,
                        """\
                        Duplicate keys will become an error in future releases, and are errors
                        by default when using the new API.
                        """,
                    ]
                    if self.allow_duplicate_keys is None:
                        warnings.warn(DuplicateKeyFutureWarning(*args))
                    else:
                        raise DuplicateKeyError(*args)
                del node.value[index]
                if isinstance(value_node, ScalarNode) and value_node.tag == '!load':
                    file_path = None
                    try:
                        if self.loader.reader.stream is not None:
                            file_path = Path(self.loader.reader.stream.name).parent / value_node.value
                    except AttributeError:
                        pass
                    if file_path is None:
                        file_path = Path(value_node.value)
                    # there is a bug in ruamel.yaml<=0.17.20 that prevents
                    # the use of a Path as argument to compose()
                    with file_path.open('rb') as fp:
                        merge.extend(ruamel.yaml.YAML().compose(fp).value)
                elif isinstance(value_node, MappingNode):
                    self.flatten_mapping(value_node)
                    print('vn0', type(value_node.value), value_node.value)
                    merge.extend(value_node.value)
                elif isinstance(value_node, SequenceNode):
                    submerge = []
                    for subnode in value_node.value:
                        if not isinstance(subnode, MappingNode):
                            raise ConstructorError(
                                'while constructing a mapping',
                                node.start_mark,
                                _F(
                                    'expected a mapping for merging, but found {subnode_id!s}',
                                    subnode_id=subnode.id,
                                ),
                                subnode.start_mark,
                            )
                        self.flatten_mapping(subnode)
                        submerge.append(subnode.value)
                    submerge.reverse()
                    for value in submerge:
                        merge.extend(value)
                else:
                    raise ConstructorError(
                        'while constructing a mapping',
                        node.start_mark,
                        _F(
                            'expected a mapping or list of mappings for merging, '
                            'but found {value_node_id!s}',
                            value_node_id=value_node.id,
                        ),
                        value_node.start_mark,
                    )
            elif key_node.tag == 'tag:yaml.org,2002:value':
                key_node.tag = 'tag:yaml.org,2002:str'
                index += 1
            else:
                index += 1
        if bool(merge):
            node.merge = merge  # separate merge keys to be able to update without duplicate
            node.value = merge + node.value


yaml = ruamel.yaml.YAML(typ='safe', pure=True)
yaml.default_flow_style = False
yaml.Constructor = MyConstructor



yaml_str = """\
<<: !load base.yaml
foo: 2
baz: baz
"""

data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
print('---')

file_name = Path('test.yaml')
file_name.write_text("""\
<<: !load base.yaml
bar: 2
baz: baz
""")

data = yaml.load(file_name)
yaml.dump(data, sys.stdout)

this prints:

bar:
- 2
- 3
baz: baz
foo: 2
---
bar: 2
baz: baz
foo: 1

Notes:

  • don't open YAML files as text. They are written binary (UTF-8), and you should load them as such (open(filename, 'rb')).
  • If you had included a full working program in your question (or at least included the text of IncludeLoader, it would have been possible to provide a full working example with the merge keys (or find out for you that it doesn't work for some reason)
  • as it is, it is unclear if your yaml.load() is an instance method call (import ruamel.yaml; yaml = ruamel.yaml.YAML()) or calling a function (from ruamel import yaml). You should not use the latter as it is deprecated.
Anthon
  • 69,918
  • 32
  • 186
  • 246
1

I recommend these 2 steps:

  1. write or plug-in a custom load-constructor for including other files
  2. structure the YAML for merging other keys (within the same file) or including other files

Similar to what Anthon, author and maintainer of ruamel.yaml answered ... but with pyyaml.

How to merge in YAML keys (from within same file)

See the YAML merge syntax <<: is for merging in keys:

---
- &OTHER { foo: 1, bar: [2, 3] }

# merge it in from above
<< : OTHER 

# the base
foo: 1

bar:
  - 2
  - 3

How to include YAML files

YAML markup syntax has no include-directive or similar. But each YAML-parser implementation can offer this feature.

For example, see another answer of Josh which has does so:

PyYAML allows you to attach custom constructors (such as !include) to the YAML loader.

This constructor can be plugged-in by pyyaml-include does:

import yaml
from yamlinclude import YamlIncludeConstructor

YamlIncludeConstructor.add_to_loader_class(loader_class=yaml.FullLoader, base_dir='/your/conf/dir')  # or specify another dir relatively or absolutely
# default is: include YAML files from current working directory

with open('base.yaml') as f:
    data = yaml.load(f, Loader=yaml.FullLoader)

print(data)

To include (not merging in) a second file some_file.yaml (located in same directory) given as:

foo: 1

bar:
  - 2
  - 3

within the base.yaml add:

!include some_file.yaml    # includes the file on top-level (relative path!)

foo: 1

bar:
  - 2
  - 3

See also:

Recommended filename extension

From Wikipedia, YAML:

The official recommended filename extension for YAML files has been .yaml since 2006.12

(sourced from the official YAML-FAQ: "YAML Ain't Markup Language". September 24, 2006. Archived from the original on 2006-09-24.)

hc_dev
  • 8,389
  • 1
  • 26
  • 38
  • It is good to know that someone read my [contribution](https://en.wikipedia.org/w/index.php?title=YAML&diff=881879577&oldid=879937809) to Wikipedia. – Anthon Feb 11 '22 at 10:06