0

I understand this isn't strictly valid YAML, but I'm wondering how I could modify my pyyaml parser to group the values associated with duplicate keys into a list.

So, I have the following:

import yaml
from io import StringIO

data = """
NAME: Best Test
TEST: True

More: Yes

NO_LEADING_ZEROS: 0713

ENTITY:
    Name: Great
    Value: 10

ENTITY:
    Name: Even better
    Value: 11

"""

with StringIO(data) as f:
    parsed = yaml.load(f.read(), Loader=yaml.BaseLoader)
print(parsed)

assert(isinstance(parsed.get("ENTITY"), list))

I get the following:

{'NAME': 'Best Test', 
'TEST': 'True', 
'ENTITY': {'Name': 'Even better', 'Value': '11'}, 
'More': 'Yes', 
'NO_LEADING_ZEROS': '0713'}

Assuming I cannot change the "yaml" but only my parser, what would I change to get the following dictionary instead?

{'NAME': 'Best Test', 
'TEST': 'True', 
'ENTITY': [
  {'Name': 'Great', 'Value': '10'}
  {'Name': 'Even better', 'Value': '11'}], 
'More': 'Yes', 
'NO_LEADING_ZEROS': '0713'}

I've seen this answer which looks similar Getting duplicate keys in YAML using Python

But that solution did not work for my use case, in that it just made every field into a list, and returned this:

{'NAME': ['Best Test'], 
 'TEST': [True], 
  'ENTITY': [defaultdict(<class 'list'>, 
     {'Name': ['Great'], 'Value': [10]}), 
             defaultdict(<class 'list'>, 
     {'Name': ['Even better'], 'Value': [11]})], 
'More': [True], 
'NO_LEADING_ZEROS': [459]})
Mittenchops
  • 18,633
  • 33
  • 128
  • 246

1 Answers1

1

The question you linked gives a solution that needs only minor modification to do what you want:

import yaml

def parse_preserving_duplicates(src):
    # We deliberately define a fresh class inside the function,
    # because add_constructor is a class method and we don't want to
    # mutate pyyaml classes.
    class PreserveDuplicatesLoader(yaml.loader.Loader):
        pass

    def map_constructor(loader, node, deep=False):
        """Walk the mapping, recording any duplicate keys.

        """
        mapping = {}
        for key_node, value_node in node.value:
            key = loader.construct_object(key_node, deep=deep)
            value = loader.construct_object(value_node, deep=deep)

            if key not in mapping:
              mapping[key] = []

            mapping[key].append(value)

        for k,v in mapping.items():
           if len(v) == 1:
             mapping[k] = v[0]
        return mapping

    PreserveDuplicatesLoader.add_constructor(yaml.resolver.BaseResolver.DEFAULT_MAPPING_TAG, map_constructor)
    return yaml.load(src, PreserveDuplicatesLoader)

I removed the defaultdict since while it's useful for loading, you might not want everything to be a defaultdict. After constructing the mapping I added code that, if a key only had one value, removes the list surrounding the value. This gives you lists only for keys that actually occur multiple times.

flyx
  • 35,506
  • 7
  • 89
  • 126