9

I'm trying to do something which I think should be pretty simple but has turned into quite a rabbit hole and I think there must be a better way.

Imagine you have two consecutive patch files (which represent sequential changes) to a source file but you don't have the source file itself.

How would you combine both patches into a single patch which represents the combined set of changes. The result of applying the combined patch to the source file should be identical to applying the two patches in sequence. All context should be preserved

Is there a well-known algorithm for this?

Example. Take these two patches

@@ -1,1 +1,2 @@
+ add this first line
this line is just context
@@ -1,2 +1,2 @@
- add this first line
+ change the first line
this line is just context
@@ -7,2 +7,2 @@
context
- change this line
+ to this one
more context

The result would be:

@@ -1,1 +1,2 @@
+ change the first line
this line is just context
@@ -7,2 +7,2 @@
context
- change this line
+ to this one
more context

The only tool/library I've found for this use case is this one, but in testing it has more than a few bugs and the code is too dense for me to sort out what the underlying algorithm is: https://github.com/twaugh/patchutils

  • as far as a procedure to combine patches could go : you could generate a file, which has line numbers and context lines matching the ones expected in your patches, apply the patches to said file, and get the final diff. – LeGEC Feb 21 '23 at 07:04
  • for this to work seemlessly, you need to have exactly compatible patches: on all the lines affected by patch `1 .. n`, patch `n+1` should have a context which match exactly those lines (line number and line content). – LeGEC Feb 21 '23 at 07:07
  • to create the initial file, you should scan all the patches, to see what lines the starting file should have. e.g: taking your simple example, you have to read both patches to discover that the starting file needs to have at least 9 lines -- or is it 10 ? -- so that patch 2 applies. – LeGEC Feb 21 '23 at 07:09
  • @LeGEC thanks for the suggestion! I have tried to do something like that and the issue is that when getting the final diff the number of blank lines causes an issue, at least in my tests. So I am looking for a way to directly combine the patches, without re-diffing. – Sam Stern Feb 21 '23 at 15:06
  • 1
    why not using what you have from git now? applying the 2 patches one after another and than making a new patch from the 2? – A-_-S Mar 02 '23 at 10:53

1 Answers1

1

First I fixed the syntax errors in your patch files:

  • Patch files must have a file header, so I added the --- and +++ lines.
  • Context lines start with a single space, so I added to them.
  • The numbers in the @@ hunks must match the lines that follow, so I changed the 2 to 3 in order to include the more context line.

p1 is now:

--- old
+++ new
@@ -1,1 +1,2 @@
+ add this first line
 this line is just context

p2 is now:

--- old
+++ new
@@ -1,2 +1,2 @@
- add this first line
+ change the first line
 this line is just context
@@ -7,3 +7,3 @@
 context
- change this line
+ to this one
 more context

Running combinediff p1 p2 resulted in:

combinediff: hunk-splitting is required in this case, but is not yet implemented
combinediff: use the -U option to work around this

Running combinediff -U1 p1 p2 resulted in:

diff -U1 new new
--- new
+++ new
@@ -1 +1,2 @@
+ change the first line
 this line is just context
@@ -6,3 +7,3 @@
 context
- change this line
+ to this one
 more context

The result differs from your expectation in a two places:

  • The generated patch has @@ -1 instead of @@ -1,1. That's an allowed abbreviation.
  • The generated patch has @@ -6,3 instead of @@ -7,3, which correctly accounts for the 1 line that the first patch has added.

Except for the unimplemented feature, that looks exactly as expected to me.

  • Yes `combinediff` mostly works although I found 2 bugs in it already and the C code is pretty hard to follow so I am looking for a pointer to the general algorithmic approach. – Sam Stern Mar 03 '23 at 20:07