1

I have a YAML file that might be changed on two different branches within a git repository. For simplicity let's start with this content as a base and create two branches:

a: 1

If I add key b with value 1 on one branch and key c with value 2 on the second branch and try to merge both, I get a merge conflict which might be resolved with adding a parameter to merging strategy -X theirs or -X ours. In both cases, unfortunately, the resulting merge commit ends up ignoring changes from one branch.

If I understood it correctly, this is intended behavior because git does not analyze the content of the file and works only with diffs on each branch. However, I would like to end up with this:

a: 1
b: 1
c: 1

Is there any simple way to do that?

Kostrahb
  • 709
  • 1
  • 9
  • 21
  • Does this answer your question? [git merge, keep both](https://stackoverflow.com/questions/13263902/git-merge-keep-both) – flyx Jun 05 '20 at 09:40
  • No, it does not. There also could be changes that delete and change keys .. which as I think about it causes much more difficulties than I originally thought – Kostrahb Jun 05 '20 at 10:03
  • closely related to https://stackoverflow.com/questions/13727300/git-merge-conflict-with-yaml-files – Kay V Feb 25 '22 at 11:58

1 Answers1

0

Git does its merge work using simple line-based text matching. This does not suffice in general for merging yaml data. Do not attempt to merge arbitrary yaml with Git's merge algorithms. Do your own merge, some other way, using the base version of the file as the starting point and both branch-tip versions of the file as the changes to be merged.

For this specific example, a union merge will work, as flyx noted. It's the general case that is much more difficult and not supported. Note, by the way, that after Git declares a merge conflict on the file, you can extract all three inputs and run git merge-file --union to produce the union merge just this one time, without a special entry in .gitattributes.

torek
  • 448,244
  • 59
  • 642
  • 775
  • I see, I'll take that into account. But out of curiosity, do you possibly know about some rather scientific materials which deal with merging of changes of files with a tree-like structure like yaml, json or xml? – Kostrahb Jun 05 '20 at 10:29
  • There are or were some research efforts on producing sensible diffs of tree-structured data, which is the first part of doing such a merge. The main issue here is that we'd like to detect when something has moved up or down some number of levels in the tree. Assuming we find such a match, we'd record that on the base-to-HEAD side, do the same analysis on the base-to-theirs side, and try to combine these level moves *along with* any in-single-level add or remove operations. – torek Jun 05 '20 at 10:37
  • Interesting, constructing Merkle trees from both versions and comparing hashes of nodes does not work or is too slow? – Kostrahb Jun 05 '20 at 10:51
  • Note that YAML is generally a graph, not a tree. You might or might not care about this depending on whether you use its anchor/alias feature. – flyx Jun 05 '20 at 11:43
  • Generally we'd like to match a sub-node *even if there are changes within it*, so just hashing them doesn't produce the desired result. That's a good way to quickly identify identical sub-trees, though (which is very useful for paring away work). @flyx: good point - though for merging purposes it's probably sensible just to treat them as uninterpreted text (if you can get that from your yaml parser!). – torek Jun 05 '20 at 12:26
  • @torek While you can get them as events in all parsers that provide access to the event tree (true for e.g. libyaml and pyyaml), you can get some nasty edge cases like e.g. `{ a: null, b: [1, 2]}` and `{a: &a [1, 2], b: *a}` where the only change is that the value of `a` changes from `null` to a reference to the sequence held by `b` but YAML requires the first occurrence of the sequence is spelled out, so the sequence node's representation is moved to `a`. Better to stop with an error at encountering an alias since if you treat it as text, you'd see a change in `b`'s value where there is none. – flyx Jun 05 '20 at 13:05
  • @flyx: ah, I only learned about yaml anchors last month, and had not gone back and checked the specification and had not thought about this. You're quite right, with the pointer aspect of a reference we can form graphs and even cyclic items. That just makes the yaml merge problem that much harder, of course. – torek Jun 05 '20 at 14:38
  • You might be interested inOperational Transforms (OT) or Convergent Replicated Data Types (CRDTs). Check out `yjs` for instance. In essence, approaches that can merge tree structured data without conflicts require some additional metadata describing the sequence of operations that occurred. Simply comparing two snapshots of the tree does not contain the necessary information to make automatic merge decisions in all circumstances. – Brandon Jan 28 '23 at 01:07