12

Is it possible using some smart piping and coding, to merge yaml files recursively? In PHP, I make an array of them (each module can add or update config nodes of/in the system).

The goal is an export shellscript that will merge all separate module folders' config files into big merged files. It's faster, efficient, and the customer does not need the modularity at the time we deploy new versions via FTP, for example.

It should behave like the PHP function: array_merge_recursive

The filesystem structure is like this:

mod/a/config/sys.yml
mod/a/config/another.yml
mod/b/config/sys.yml
mod/b/config/another.yml
mod/c/config/totally-new.yml
sys/config/sys.yml

Config looks like:

date:
   format:
      date_regular: %d-%m-%Y

And a module may, say, do this:

date:
   format:
      date_regular: regular dates are boring
      date_special: !!!%d-%m-%Y!!!

So far, I have:

#!/bin/bash
#........
cp -R $dir_project/ $dir_to/
for i in $dir_project/mod/*/
do
    cp -R "${i}/." $dir_to/sys/
done

This of course destroys all existing config files in the loop.. (rest of the system files are uniquely named)

Basically, I need a yaml parser for the command line, and an array_merge_recursive like alternative. Then a yaml writer to ouput it merged. I fear I have to start to learn Python because bash won't cut it on this one.

Ondra Žižka
  • 43,948
  • 41
  • 217
  • 277
twicejr
  • 1,319
  • 3
  • 13
  • 21

6 Answers6

17

You can use for example perl. The next oneliner:

perl -MYAML::Merge::Simple=merge_files -MYAML -E 'say Dump merge_files(@ARGV)' f1.yaml f2.yaml

for the next input files: f1.yaml

date:
  epoch: 2342342343
  format:
    date_regular: "%d-%m-%Y"

f2.yaml

date:
  format:
    date_regular: regular dates are boring
    date_special: "!!!%d-%m-%Y!!!"

prints the merged result...

---
date:
  epoch: 2342342343
  format:
    date_regular: regular dates are boring
    date_special: '!!!%d-%m-%Y!!!'

Because @Caleb pointed out that the module now is develeloper only, here is an replacement. It is a bit longer and uses two (but commonly available) modules:

perl -MYAML=LoadFile,Dump -MHash::Merge::Simple=merge -E 'say Dump(merge(map{LoadFile($_)}@ARGV))' f1.yaml f2.yaml

produces the same as above.

clt60
  • 62,119
  • 17
  • 107
  • 194
  • And how would you install the dependency in a oneliner, includable on a linux system that has got perl installed, but not this specific dependency ? Take a debian system for instance. Welcome to perl ? – vaab Jan 01 '16 at 02:32
  • Exactly as you solving the dependency issues in any other language. Install the library. Here are already some questions about how to install CPAN modules. Or use your OS prepackaged packages. I know nothing about the debian. Or use other library, what is already installed on your system. (in ay language). – clt60 Jan 01 '16 at 10:18
  • 1
    This does not seem to be available even as a module on CPAN, so the advice on installing it was a bit dismissive. The only reference I've found to it is [this Github repository](https://github.com/andrefs/YAML-Merge-Simple). – Caleb Nov 16 '16 at 14:33
  • 1
    Apparently this [is on CPAN](http://search.cpan.org/~andrefs/YAML-Merge-Simple-0.01_01/) but it is tagged as a developer only release so your normal cpan install tools might not find it unless you specify. For those on Arch I just added a package for it to the AUR. – Caleb Nov 17 '16 at 08:01
  • I think that sometimes it was an normal (not developer) module, because I have installed it, and me using cpanm only. Thank you for pointing this out. Added an alternative solution. – clt60 Nov 17 '16 at 09:19
  • Thanks for the update. There are a couple things that could use tweaking though. The `YAML::Merge::Simple` solutions outputs a bogus opening YAML marker (`--- |\n---\n`). The `Hash::Merge::Simple` solution on the other hand doesn't output start/end markers at all and crashes if any of the YAML input files have them. – Caleb Nov 17 '16 at 10:08
  • @Caleb :) Any script can be tweaken to be more complex. Maybe (not tested) it will fail for an files which contains comments only lines, or which contains multiple markers. I created an answer for the OP's question. Simple answer for a simple question. – clt60 Nov 20 '16 at 09:50
6

I recommend yq -m. yq is a swiss army knife for yaml, very similar to jq (for JSON).

Stefan Frye
  • 2,007
  • 1
  • 20
  • 24
  • As of now, it seems the `-m` option has been discontinued. – Matt Jan 27 '21 at 22:52
  • Then maybe try this: https://mikefarah.gitbook.io/yq/v/v4.x/upgrading-from-v3#merging-documents – Stefan Frye Feb 01 '21 at 15:43
  • `yq ea '. as $item ireduce ({}; . * $item )' file1.yaml file2.yaml > file-merged.yaml` That worked with `yq` version 4.20.2 – quasar Feb 25 '22 at 14:27
  • You aren't wrong but I think the [documented solution for merging complex arrays of objects](https://mikefarah.gitbook.io/yq/operators/multiply-merge#merge-arrays-of-objects-together-matching-on-a-key) demonstrates maybe this isn't as easy as expected? – andyfeller Nov 02 '22 at 01:46
4

No.

Bash has no support for nested data structures (its maps are integer->string or string->string only), and thus cannot represent arbitrary YAML documents in-memory.

Use a more powerful language for this task.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • That is too bad. I'd better not try to represent to structures with local variables using loops, that sounds crazy. What language do you recommend, python? I like to learn a versatile language which can be used on most platforms. – twicejr Sep 02 '14 at 19:31
  • Indeed. Incidentally, there are good tools for transforming JSON and XML to and from line-oriented formats that bash can work from easily and correctly, but (1) none for YAML that I'm familiar with, and (2) even then, that leaves the work of actually implementing the merge algorithm without having real data structures. – Charles Duffy Sep 02 '14 at 19:33
  • 1
    ...that said, learning Python is well worth doing, and implementing a recursive tree-merge algorithm there is a rather painless wheel to reinvent; I've done it more times than I can remember. If you're comfortable with C, you might consider Go as well. – Charles Duffy Sep 02 '14 at 19:35
  • Then I know what to do :) Thanks. – twicejr Sep 02 '14 at 19:36
2

Late to the party, but I also wrote a tool for this:

https://github.com/benprofessionaledition/yamlmerge

It's almost identical to Ondra's JVM tool (they're even both called "yaml merge"), the key difference being that it's written in Go so it compiles to a ~3MB binary with no external dependencies. We use it in Gitlab-CI containers.

stuart
  • 1,005
  • 1
  • 10
  • 18
1

Bash is a bit of a stretch for this (it could be done but it would be error prone). If all you want to do is call a few things from a bash shell (as opposed to actually scripting the merge using bash functions) then you have a few options.

I noticed there is a Java based yaml-merge tool, but that didn't suit my fancy very much, so I kept looking. In the end I clobbered together something using two tools: yaml2json and jq.

Warning: Since JSON's capabilities are only a subset of YAML's, this is not a lossless process for complex YAML structures. It will work for a lot of simple key/value/sequence scenarios but will muck things up if your input YAML is too fancy. Test it on your data types to see if it does what you expect.

  1. Use yaml2json to convert your inputs to JSON:

    yaml2json input1.yml > input1.json
    yaml2json input2.yml > input2.json
    
  2. Use jq to iterate over the objects and merge them recursively (see this question and answers for details). List files in reverse order of importance as values in later ones will clobber earlier ones:

    jq -s 'reduce .[] as $item({}; . + $item)' input1.json input2.json > merged.json
    
  3. Take it back to YAML:

    json2yaml merged.json > merged.yml
    

If you want to script this, of course the usual bash mechanisms are your friend. And if you happen to be in GNU-Make like I was, something like this will do the trick:

.SECONDEXPANSION:
merged.yml: input1.yml input2.yml
    json2yaml <(jq -s 'reduce .[] as $$item({}; . + $$item)' $(foreach YAML,$^,<(yaml2json $(YAML)))) > $@
Community
  • 1
  • 1
Caleb
  • 5,084
  • 1
  • 46
  • 65
1

There is a tool that merges YAML files - merge-yaml. It supports full YAML syntax, and is capable of expanding environment variables references.

I forked it and released it into a form of an executable .jar.
https://github.com/OndraZizka/yaml-merge

Usage:

./bin/yaml-merge.sh ./*.yml > result.yml

It is written in Java so you need Java (I think 8 and newer) installed.
(Btw, if someone wants to contribute, that would be great.)


In general, merging YAML is not a trivial thing, in the sense that the tool doesn't always know what you really want to do. You can merge structures in multiple way. Think if this example:

foo:
   bar: bar2
   baz: 
      - baz1
---
foo:
   bar: bar1
   baz: 
      - baz2
   goo: gaz1

Few questions / unknowns arise:

  • Should the 2nd foo tree replace the first one?
  • Should the 2nd bar replace the first one, or merge to an array?
  • Should the 2nd baz array replace the 1st, or be merged?
    • If merged, then how - should there be duplicities, or should the tool keep the values unique? Should the order be managed in some way?

Etc. One may object that there can be some default, but often, the real world requirements need different operations.

Other tools and libraries to deal with data structures deal with this by defining a scheme with metadata, for instance, JAXB or Jackson use Java annotations.
For this general tool, that is not an option, so the user would have to control this through a) the input data, or b) parameters. a) is impractical and sometimes impossible, b) is tedious and needs a fancy syntax like jq has.

That said, Caleb's answer might be what you need. Although, that solution reduces your data to what JSON is capable of, so you will loose comments, various way to represent long strings, usage of JSON within YAML, etc., which is not too user friendly.

Ondra Žižka
  • 43,948
  • 41
  • 217
  • 277