13

As part of a larger project, I want the ability to take two bodies of text and hand them to a merge algorithm which returns either an auto-merged result (in cases where the changes are not conflicting) or throws an error and (potentially) produces a single text document with the conflicting changes highlighted.

Basically, I just want a programmatic way to do what every source control system on the planet does internally, but I'm having a hard time finding it. There are tons of visual GUIs for doing this sort of thing that dominate my search results, but none of them seem to make easily accessible the core merging algorithm. Does everyone rely on some common and well understood algorithm/library and I just don't know the name so I'm having a hard time searching for it? Is this some just minor tweak on diff and I should be looking for diff libraries instead of merge libraries?

Python libraries would be most helpful, but I can live with the overhead of interfacing with some other library (or command line solution) if I have to; this operation should be relatively infrequent.

drewww
  • 2,485
  • 4
  • 22
  • 24
  • Automatic merges aren't safe, because they have no understanding of the programmers (plural) intent; there's no gaurantee that a "merged" file works let alone works as intended by somebody. The source control systems weasel out of this by implicitly assuming the user will somehow retest (although whether that happens is another question). How would you use the results of an automatic merge? – Ira Baxter Nov 01 '10 at 00:57
  • In this case, I'm not merging code, I'm syncing text files that may have been modified while out of touch with the server. When the offline client reconnects, I need to compare their local version with the server version. True un-mergeable conflicts will be rare because of the design of the application, but they will happen occasionally and I just need to know when they occur. I'm not expecting auto-merge to be perfect, just to notify me when it fails and let me degrade gracefully without losing either server or client content in the process. – drewww Nov 01 '10 at 14:22
  • Questions like these are why I love StackOverflow. The quality of the community here is crazy awesome! – ehfeng Oct 25 '11 at 02:09

2 Answers2

12

You're probably searching for merge algorithms like 3-way merging, which you can find in many open source projects, e.g. in the bazaar VCS (merge3.py source).

Smi
  • 13,850
  • 9
  • 56
  • 64
AndiDog
  • 68,631
  • 21
  • 159
  • 205
  • Ah, yeah, I think that's exactly the magic phrase I needed! I'll have to dig through these different versions to see what's easily extractable/abstractable from its context, but a first pass through looks really promising. Thanks! – drewww Nov 01 '10 at 15:39
  • For anyone looking for a packaged solution: https://pypi.org/project/three-merge/ – Mugen Jan 31 '23 at 14:26
1

Did you check out difflib

pyfunc
  • 65,343
  • 15
  • 148
  • 136
  • 1
    This looks powerful, but it seems to me like it can't actually do the merging part of the process, just the diff part. I might be able to build a merge system on top of SequenceMatcher, but that seems like a big step. – drewww Nov 01 '10 at 15:36
  • If you're up for it, you can snag the merge code out of meld. – Clay Bridges Apr 12 '12 at 22:09