Detect a commit diff/patch/hunk in another branch similar to the way cherry-pick does

Question

Using git, I would like to check which, if any, of the diffs from a specific commit have been applied (and where) to a specific branch.

cherry-pick does this (except for the "where" part) when you cherry pick a commit and only applies the diffs that haven't been applied to that branch already. For example, if I commit two file changes to a topic branch, I manually apply one of those same file changes to master, when I cherry-pick the commit with the two file diffs from topic to master, it only applies the un-applied diff. To demonstrate this, run the following commands from an empty folder and look at the final diff output:

git init
echo file 1 > file1.txt
git add .
git commit -m "Initial commit"
git branch topic
echo file 1.1 > file1.txt
git add .
git commit -m "Change file1.txt on master"
git checkout topic
echo file 1.1 > file1.txt
echo file 2 > file2.txt
git add .
git commit -m "Change file1.txt and add file2.txt on topic"
git tag to-cherrypick
git checkout master
git cherry-pick --no-commit to-cherrypick
git diff --cached

The diff output is:

diff --git a/file2.txt b/file2.txt
new file mode 100644
index 0000000..6bb4b1d
--- /dev/null
+++ b/file2.txt
@@ -0,0 +1 @@
+file 2

Showing that even though the cherry-picked commit had two file changes, it detected the first one was already applied.

So my question is: Does git have an exposed plumbing (or somewhere between plumbing and porcelain) command that will look at the hunks in a commit and check for each hunk in another branch, showing the commit where each hunk is applied?

And if that command doesn't exist, what are some pathways to finding that? I'm already looking into possibly writing a script/program to break the commit into it's hunks, run patch-id or something similar on each, then looping through the other branching starting at the common ancestor and doing the same for each commit, and comparing hunks. I would like to avoid writing that if it is already exposed -- I know it exists because cherry-pick does that (except for showing the "where").

UPDATE: git cherry will do some of what I am looking for. It will tell me which commits in one branch have one or more diff hunks that haven't been applied to another. It does not tell where those diff hunks were applied or if a commit has hunks that were applied and others that weren't.

First, side note: `git cherry-pick` doesn't do anything special here; it's `git rebase` that uses `git cherry` / `git rev-list --right-only --cherry-pick` that does the special thing. Second: the command that computes "sameness" is `git patch-id`. This reads stdin and prints a hash. If you take a diff hunk and run it through `git patch-id` you'll get a hash you can compare to any other diff hunk's hash to see if those diff hunks are "the same". See [the `git patch-id` documentation](https://git-scm.com/docs/git-patch-id). — torek, Dec 18 '19 at 22:16
(When you use `git cherry-pick` to apply a commit that is already applied, you end up with a sort of glorified no-op, in most cases, but this confuses rebase a bit, which is why rebase just skips them up front.) — torek, Dec 18 '19 at 22:17
@torek I was using cherry-pick as an example but I appreciate the info on how cherry-pick and rebase work. That gives me more to look into. In regards to your 2nd comment, my example shows that cherry-pick is smart enough to go deeper than just a commit and looks at the individual hunks and ignore hunks that have already been applied while still applying those that haven’t. — MikeJansen, Dec 19 '19 at 02:07
Right: cherry-pick uses the merge *machinery*, just in a somewhat weird way. It sets the merge base commit to the parent of the commit being picked. The `--ours` commit (index slot 2) is `HEAD` as usual, and the `--theirs` commit (index slot 3) is the commit being cherry-picked, with index slot 1 (merge base) being the (single) parent (hence `-m` necessary if the cherry-picked commit has 2 or more parents). The eventual *commit* is not a *merge commit* but the action uses the merge *code*. — torek, Dec 19 '19 at 08:07
@torek So do you know of any logic from the merge machinery that is exposed that will tell if diff hunks in a commit have been applied to a specified branch? — MikeJansen, Dec 19 '19 at 12:08
@torek -- see my update at the end of the description. git cherry does some of what I'm looking for. — MikeJansen, Dec 19 '19 at 14:34
There isn't any. The best you can get with relatively easy scripting is to separate out each diff hunk (the perl code that implements `add -p` and `reset -p` does this), run it through `git patch-id` (multiple internal places in Git do this, including `git rerere`), and save the patch IDs somewhere so that you can compare them. — torek, Dec 19 '19 at 17:46

score 0 · Answer 1 · answered Feb 20 '22 at 10:49

As commented:

The best you can get with relatively easy scripting is to separate out each diff hunk (the perl code that implements add -p and reset -p does this), run it through git patch-id (multiple internal places in Git do this, including git rerere), and save the patch IDs somewhere so that you can compare them.

First: there is no more "perl" version of any Git command.
Everything has been rewritten in C.

For instance git add -p was rewritten inc C as part of Git 2.25 (Q4 2019) (commit f6aa7ec)

Second, if you are using git patch-id, make sure to use Git 2.36 (Q2 2022).

Unlike "git apply"^(man), "git patch-id"^(man) did not handle patches with hunks that has only 1 line in either preimage or postimage, which has been corrected with Git 2.36 (Q2 2022).

See commit 757e75c, commit 56fa5ac (01 Feb 2022) by Jerry Zhang (jerry-skydio).
^{(Merged by Junio C Hamano -- gitster -- in commit d077db1, 17 Feb 2022)}

patch-id: fix scan_hunk_header on diffs with 1 line of before/after

^{Signed-off-by: Jerry Zhang}

Normally diffs will contain a hunk header of the format "@@ -2,2 +2,15 @@ code".
However when there is only 1 line of change, the unified diff format allows for the second comma separated value to be omitted in either before or after line counts.

This can produce hunk headers that look like "@@ -2 +2,18 @@ code" or "@@ -2,2 +2 @@ code".
As a result, scan_hunk_header mistakenly returns the line number as line count, which then results in unpredictable parsing errors with the rest of the patch, including giving multiple lines of output for a single commit.

Fix by explicitly setting line count to 1 when there is no comma, and add a test.

apply.c contains this same logic except it is correct.
A worthwhile future project might be to unify these two diff parsers so they both benefit from fixes.

Detect a commit diff/patch/hunk in another branch similar to the way cherry-pick does

1 Answers1

`patch-id`: fix `scan_hunk_header` on diffs with 1 line of before/after

Detect a commit diff/patch/hunk in another branch similar to the way cherry-pick does

1 Answers1

patch-id: fix scan_hunk_header on diffs with 1 line of before/after

`patch-id`: fix `scan_hunk_header` on diffs with 1 line of before/after