Make sure to use a recent version of Git (2.39 or more)
The git log --format=%H
mentioned by the OP bsb's answer is not always unique.
That is because, before Git 2.29 (Q4 2020), the patch-id computation did not ignore the "incomplete last line" marker like whitespaces.
See commit 82a6201 (19 Aug 2020) by René Scharfe (rscharfe
).
(Merged by Junio C Hamano -- gitster
-- in commit 5122614, 24 Aug 2020)
patch-id
: ignore newline at end of file in diff_flush_patch_id()
Reported-by: Tilman Vogel
Initial-test-by: Tilman Vogel
Signed-off-by: René Scharfe
Whitespace is ignored when calculating patch IDs.
This is done by removing all whitespace from diff lines before hashing them, including a newline at the end of a file.
If that newline is missing, however, diff reports that fact in a separate line containing "\ No newline at end of file\n", and this marker is hashed like a context line.
This goes against our goal of making patch IDs independent of whitespace.
Use the same heuristic that 2485eab55cc (git-patch-id: do not trip over "no newline" markers, 2011-02-17) added to git patch-id
(man) instead and skip diff lines that start with a backslash and a space and are longer than twelve characters.
A "patch ID" is nothing but a SHA-1 of the diff associated with a patch, with whitespace and line numbers ignored
Actually, git patch-id
will evolve with Git 2.39 (Q4 2022).
A new "--include-whitespace
" option is added to "git patch-id
"(man), and existing bugs in the internal patch-id
logic that did not match what "git patch-id
" produces have been corrected with Git 2.39 (Q4 2022).
See commit 0d32ae8, commit 2871f4d, commit 93105ab, commit 0df19eb, commit 51276c1, commit 0570be7 (24 Oct 2022) by Jerry Zhang (jerry-skydio
).
(Merged by Taylor Blau -- ttaylorr
-- in commit 160314e, 30 Oct 2022)
builtin
: patch-id: add --verbatim
as a command mode
Signed-off-by: Jerry Zhang
Signed-off-by: Junio C Hamano
There are situations where the user might not want the default setting where patch-id strips all whitespace.
They might be working in a language where white space is syntactically important, or they might have CI testing that enforces strict whitespace linting.
In these cases, a whitespace change would result in the patch fundamentally changing, and thus deserving of a different id.
Add a new mode that is exclusive of --stable
and --unstable
called --verbatim
.
It also corresponds to the config patchid.verbatim = true
.
In this mode, the stable algorithm is used and whitespace is not stripped from the patch text.
Users git of --unstable
mainly care about compatibility with old versions, which unstripping the whitespace would break.
Thus there isn't a use case for the combination of --verbatim
and --unstable
, and we don't expose this so as to not add maintenance burden.
fixes https://github.com/Skydio/revup/issues/2
git patch-id
now includes in its man page:
--verbatim
Calculate the patch-id of the input as it is given, do not strip
any whitespace.
This is the default if patchid.verbatim
is true.
But that is not all.
From the OP:
I'd like a recipe for finding duplicated changes. patch-id is likely to be the same but the commit attributes may not be.
That is also fixed with Git 2.39:
patch-id
: fix patch-id
for mode changes
Signed-off-by: Jerry Zhang
Currently patch-id as used in rebase and cherry-pick does not account for file modes if the file is modified.
One consequence of this is that if you have a local patch that changes modes, but upstream has applied an outdated version of the patch that doesn't include that mode change, "git rebase
"(man) will drop your local version of the patch along with your mode changes.
It also means that internal patch-id
doesn't produce the same output as the builtin, which does account for mode changes due to them being part of diff output.
Fix by adding mode to the patch-id if it has changed, in the same format that would be produced by diff, so that it is compatible with builtin patch-id.
And last difference which was not properly detected/reported:
builtin
: patch-id: fix patch-id with binary diffs
Signed-off-by: Jerry Zhang
"git patch-id
"(man) currently does not produce correct output if the incoming diff has any binary files.
Add logic to get_one_patchid
to handle the different possible styles of binary diff.
This attempts to keep resulting patch-ids identical to what would be produced by the counterpart logic in diff.c
, that is it produces the id by hashing the a
and b
oids in succession.
In general we handle binary diffs by first caching the object ids from the "index" line and using those if we then find an indication that the diff is binary.
The input could contain patches generated with "git diff --binary
"(man)".
This currently breaks the parse logic and results in multiple patch-ids output for a single commit.
Here we have to skip the contents of the patch itself since those do not go into the patch id.
--binary
implies --full-index
so the object ids are always available.
When the diff is generated with --full-index
there is no patch content to skip over.
When a diff is generated without --full-index
or --binary
, it will contain abbreviated object ids.
This will still result in a sufficiently unique patch-id when hashed, but does not match internal patch id output.
We'll call this OK for now as we already need specialized arguments to diff in order to match internal patch id (namely -U3
).