1

I want to load a YAML file into Python as an OrderedDict. I am using yamlordereddictloader to preserve ordering.

However, I notice that the aliased object is placed "too soon" in the OrderedDict in the output.

How can I preserve the order of this mapping when read into Python, ideally as an OrderedDict? Is it possible to achieve this result without writing some custom parsing?

Notes:

  • I'm not particularly concerned with the method used, as long as the end result is the same.
  • Using sequences instead of mappings is problematic because they can result in nested output, and I can't simply flatten everything (some nestedness is appropriate).
  • When I try to just use !!omap, I cannot seem to merge the aliased mapping (d1.dt) into the d2 mapping.
  • I'm in Python 3.6, if I don't use this loader or !!omap order is not preserved (apparently contrary to the top 'Update' here: https://stackoverflow.com/a/21912744/2343633)
import yaml
import yamlordereddictloader

yaml_file = """
d1:
  id:
    nm1: val1
  dt: &dt
    nm2: val2
    nm3: val3

d2: # expect nm4, nm2, nm3
  nm4: val4
  <<: *dt
"""

out = yaml.load(yaml_file, Loader=yamlordereddictloader.Loader)
keys = [x for x in out['d2']]
print(keys) # ['nm2', 'nm3', 'nm4']
assert keys==['nm4', 'nm2', 'nm3'], "order from YAML file is not preserved, aliased keys placed too early"
rbatt
  • 4,677
  • 4
  • 23
  • 41

1 Answers1

1

Is it possible to achieve this result without writing some custom parsing?

Yes. You need to override the method flatten_mapping from SafeConstructor. Here's a basic working example:

import yaml
import yamlordereddictloader
from yaml.constructor import *
from yaml.reader import *
from yaml.parser import *
from yaml.resolver import *
from yaml.composer import *
from yaml.scanner import *
from yaml.nodes import *

class MyLoader(yamlordereddictloader.Loader):
  def __init__(self, stream):
    yamlordereddictloader.Loader.__init__(self, stream)
    
  # taken from here and reengineered to keep order:
  # https://github.com/yaml/pyyaml/blob/5.3.1/lib/yaml/constructor.py#L207
  def flatten_mapping(self, node):
    merged = []
    def merge_from(node):
      if not isinstance(node, MappingNode):
        raise yaml.ConstructorError("while constructing a mapping",
            node.start_mark, "expected mapping for merging, but found %s" %
            node.id, node.start_mark)
      self.flatten_mapping(node)
      merged.extend(node.value)
    for index in range(len(node.value)):
      key_node, value_node = node.value[index]
      if key_node.tag == u'tag:yaml.org,2002:merge':
        if isinstance(value_node, SequenceNode):
           for subnode in value_node.value:
             merge_from(subnode)
        else:
          merge_from(value_node)
      else:
       if key_node.tag == u'tag:yaml.org,2002:value':
         key_node.tag = u'tag:yaml.org,2002:str'
       merged.append((key_node, value_node))
    node.value = merged

yaml_file = """
d1:
  id:
    nm1: val1
  dt: &dt
    nm2: val2
    nm3: val3

d2: # expect nm4, nm2, nm3
  nm4: val4
  <<: *dt
"""

out = yaml.load(yaml_file, Loader=MyLoader)
keys = [x for x in out['d2']]
print(keys)
assert keys==['nm4', 'nm2', 'nm3'], "order from YAML file is not preserved, aliased keys placed too early"

This has not the best performance as it basically copies all key-value pairs from all mappings once each during loading, but it's working. Performance enhancement is left as an exercise for the reader :).

flyx
  • 35,506
  • 7
  • 89
  • 126