0

I am using YAML files to allow users to configure a serial workflow to a python program that I am developing:

step1:
    method1:
        param_x: 44
    method2:
        param_y: 14
        param_t: string   
    method1:
        param_x: 22
step2:
    method2:
        param_z: 7
    method1:
        param_x: 44
step3:
    method3:
        param_a: string

This is then be parsed in python and stored as a dictionary. Now, I know duplicate keys in YAML and python dictionaries are not allowed (why, btw?), but YAML seems perfect for my case given it's clarity and minimalism.

I tried to follow an approach suggested in this question (Getting duplicate keys in YAML using Python). However, in my case, sometimes they are duplicated, and sometimes not and using the proposed construct_yaml_map, this will either create a dict or a list, which is not what I want. Depending on the node depth I would like to be able to send keys and values on the second level (method1, method2, ...) to a list within a python dictionary, do avoid the duplication issue.

mluerig
  • 638
  • 10
  • 25
  • 1
    If you want to have duplicate keys even though YAML forbids them, you are not using YAML. The reason they are forbidden is that they form a *mapping* in YAML, and a mapping is mathematically a function taking a key and returning the associated value. Something that maps a key to multiple values is not a mapping. You basically want to use the *syntax* of YAML while substituting its *semantics* with your own. While you can do that with APIs that expose the syntactic level of the language (as described in the linked question), don't assume this is still *„using YAML“*. – flyx Nov 25 '19 at 12:53

1 Answers1

1

While parsing ruamel.yaml has no concept of depth beyond being at the root level of a document (among other things in order to allow for root level literal scalars to be unindented). Adding such a notion of depth is going to be difficult, since you have to deal with aliases and possible recursive occurrences of data, I am also not sure what this would mean in general (although clear enough for your example).

The method creating a mapping in the default, round-trip, loader of ruamel.yaml is rather long. But if you are going to jumble mapping values together, you should not expect to be able to dump them back. let alone preserve comments, aliases, etc. The following assumes you'll be using the simpler safe loader, have aliases and/or merge keys.

import sys
import ruamel.yaml

yaml_str = """\
step1:
    method1:
        param_x: 44
    method2:
        param_y: 14
        param_t: string   
    method1:
        param_x: 22
step2:
    method2:
        param_z: 7
    method1:
        param_x: 44
step3:
    method3:
        param_a: string
"""

from ruamel.yaml.nodes import *
from ruamel.yaml.compat import Hashable, PY2


class MyConstructor(ruamel.yaml.constructor.SafeConstructor):
    def construct_mapping(self, node, deep=False):
        if not isinstance(node, MappingNode):
            raise ConstructorError(
                None, None, 'expected a mapping node, but found %s' % node.id, node.start_mark
            )
        total_mapping = self.yaml_base_dict_type()
        if getattr(node, 'merge', None) is not None:
            todo = [(node.merge, False), (node.value, False)]
        else:
            todo = [(node.value, True)]
        for values, check in todo:
            mapping = self.yaml_base_dict_type()  # type: Dict[Any, Any]
            for key_node, value_node in values:
                # keys can be list -> deep
                key = self.construct_object(key_node, deep=True)
                # lists are not hashable, but tuples are
                if not isinstance(key, Hashable):
                    if isinstance(key, list):
                        key = tuple(key)
                if PY2:
                    try:
                        hash(key)
                    except TypeError as exc:
                        raise ConstructorError(
                            'while constructing a mapping',
                            node.start_mark,
                            'found unacceptable key (%s)' % exc,
                            key_node.start_mark,
                        )
                else:
                    if not isinstance(key, Hashable):
                        raise ConstructorError(
                            'while constructing a mapping',
                            node.start_mark,
                            'found unhashable key',
                            key_node.start_mark,
                        )
                value = self.construct_object(value_node, deep=deep)
                if key in mapping:
                    if not isinstance(mapping[key], list):
                        mapping[key] = [mapping[key]]
                    mapping[key].append(value)
                else:
                    mapping[key] = value
            total_mapping.update(mapping)
        return total_mapping


yaml = ruamel.yaml.YAML(typ='safe')
yaml.Constructor = MyConstructor
data = yaml.load(yaml_str)
for k1 in data: 
    # might need to guard this with a try-except for non-dictionary first-level values
    for k2 in data[k1]:
         if not isinstance(data[k1][k2], list):   # make every second level value a list
             data[k1][k2] = [data[k1][k2]]
print(data['step1'])

which gives:

{'method1': [{'param_x': 44}, {'param_x': 22}], 'method2': [{'param_y': 14, 'param_t': 'string'}]}
Anthon
  • 69,918
  • 32
  • 186
  • 246