7

I'm trying to set up a system where I have a couple (possibly more) yaml files that will be used for configuration. I want to be able to reference things in one file in another.

I know that YAML doesn't allow this.

My plan, I think, is to combine the two YAML files, and then treat it as a single file. I'm pretty sure that I could either cat the two files together, create a temp file, and read that one as YAML, or read the files as text, concatenate them and THEN parse the string.

However, I feel that there should be a better way to do this. Is there?

Brian Postow
  • 11,709
  • 17
  • 81
  • 125
  • 1
    I think this has already being answered: https://stackoverflow.com/questions/47424865/merge-two-yaml-files-in-python – Tom Oct 26 '21 at 20:39
  • 1
    @Tom, My question is from 5.5 years ago. The one you point to is 4 years ago... just so you know... – Brian Postow Oct 27 '21 at 21:21

2 Answers2

0

The only way of referencing in YAML is to use & (anchors) and * (aliases). For these to work they have to be in the same YAML document. The following will not work (this is based on the merge key feature, but normal object referencing has the same limitation):

import ruamel.yaml

yaml_str = """\
a: &BASE { x: 1, y: 2}
---
b:
  << : *BASE
  z: 3
"""

for data in ruamel.yaml.load_all(yaml_str):
    print(data)

throws a composer error that "BASE" is not found. Remove the --- document separator and everything is fine.

So in principle concatenating two documents could work. Loading the document with the alias separately cannot be done without concatenating it with the one that contains it anchor.

Additionally the caveat is that all documents have to have either a mapping or sequence at the toplevel. If would combine a sequence:

- &BASE a
- b

with a mapping:

c: 1
d: *BASE

the result will not be loadable.


As indicated, if the toplevel type is the same for all files, you cannot load the YAML files and combine them in memory. I.e. given the example in the merge key documentation split into 1.yaml:

- &CENTER { x: 1, y: 2 }
- &LEFT { x: 0, y: 2 }
- &BIG { r: 10 }
- &SMALL { r: 1 }

2.yaml:

# Explicit keys
-
  x: 1
  y: 2
  r: 10
  label: center/big

3.yaml:

# Merge one map
-
  << : *CENTER
  r: 10
  label: center/big

4.yaml:

# Merge multiple maps
-
  << : [ *CENTER, *BIG ]
  label: center/big    

5.yaml:

# Override
-
  << : [ *BIG, *LEFT, *SMALL ]
  x: 1
  label: center/big

You cannot use load() on the individual YAML files and combine them:

import ruamel.yaml
import glob

data = []
for file_name in sorted(glob.glob('*.yaml')):
    data.append(ruamel.yaml.load(open(file_name)))
print(ruamel.yaml.dump(data, allow_unicode=True))

(the above which would work if 2.yaml, etc. didn't have the aliases)

If you don't want to concatenate the files outside of your program, you can use this class:

class CombinedOpenForReading(object):
    def __init__(self, file_names):
        self._to_do = file_names[:]
        self._fp = None

    def __enter__(self):
        return self

    def __exit__(self, exception_type, exception_value, exception_traceback):
        if self._fp:
            self._fp.close()

    def read(self, size=None):
        res = ''
        while True:
            if self._fp is None:
                if not self._to_do:
                    return res
                else:
                    self._fp = open(self._to_do.pop(0))
            if size is None:
                data = self._fp.read()
            else:
                data = self._fp.read(size)
            if size is None or not data:
                self._fp.close()
                self._fp = None
            res += data
            if size is None:
                continue
            size -= len(data)
            if size == 0:
                break
        return res

to do:

import ruamel.yaml
import glob

with CombinedOpenForReading(sorted(glob.glob('*.yaml'))) as fp:
    data = ruamel.yaml.round_trip_load(fp)
assert data[6]['r'] == 10
print(ruamel.yaml.dump(data, Dumper=ruamel.yaml.RoundTripDumper))

to get:

- &CENTER {x: 1, y: 2}
- &LEFT {x: 0, y: 2}
- &BIG {r: 10}
- &SMALL {r: 1}
# Explicit keys
- x: 1
  y: 2
  r: 10
  label: center/big
# Merge one map
- <<: *CENTER
  r: 10
  label: center/big
# Merge multiple maps
- <<: [*CENTER, *BIG]
  label: center/big
# Override
- <<: [*BIG, *LEFT, *SMALL]
  x: 1
  label: center/big

(You have to hand in the files in the right order, hence the sort. And make sure that you have newlines at the end of your files, otherwise you might get unexpected errors.)

Anthon
  • 69,918
  • 32
  • 186
  • 246
  • By your caveat do yea mean that &BASE and *BASE have to be in the same kind of thing (sequence vs mapping)? I didn't realize that that was a problem, but it shouldn't be a problem. I'm not planning on using multiple documents in one file. I'm planning on using multiple files in one document. – Brian Postow Feb 19 '16 at 17:39
  • Ah, you mean that in order for the files to be mergable, they must have the same topmost type. Yes, that shouldn't be a problem. – Brian Postow Feb 19 '16 at 18:18
  • @BrianPostow Yes, the topmost type has to be the same – Anthon Feb 20 '16 at 07:02
0

I think that this is simpler than @Anthon's. It may not be as complete, but I think it's all that I need...

def merge(fList):
    ''' 
    Takes a list of yaml files and loads them as a single yaml document.
    Restrictions:
    1) None of the files may have a yaml document marker (---)
    2) All of the files must have the same top-level type (dictionary or list)
    3) If any pointers cross between files, then the file in which they are defined (&) must be 
    earlier in the list than any uses (*).
    '''

    if not fList:
        #if flist is the empty list, return an empty list. This is arbitrary, if it turns out that
        #an empty dictionary is better, we can do something about that.
        return []

    sList = []
    for f in fList:
        with open(f, 'r') as stream:
            sList.append(stream.read())
    fString = ''
    for s in sList:
        fString = fString + '\n'+ s

    y = yaml.load(fString)

    return y

comments welcome.

Brian Postow
  • 11,709
  • 17
  • 81
  • 125
  • merging three files, replaces completely some parts instead of merging them – holms Apr 27 '17 at 17:41
  • How so? If they are dictionaries, and the keys aren't disjoint, then sure, but that's probably a YAML error anyway... – Brian Postow Apr 28 '17 at 14:01
  • i have no idea sincerely. Ended up using merge-yaml npm cli tool. And also had problem in there, so i had to move one file to second position :D how this is possible i wonder. there's even yamlreader available which checks yaml file syntax errors and formatting it, still didn't help – holms Apr 30 '17 at 01:25