3

This answer explains that normally a git commit SHA is generated based on various parameters. However, I would like to know: how can one specify a custom/particular/specific git commit sha (in Bash)?

For example, suppose one wants to create and push a commit to Git with the following sha:

1e23456ffd118db9dc04caf40a442040e5ec99f9

(For simplicity, assume one can assume it is a unique sha).

The XY-problem is a manual mirror script between two different Git servers. It would be more convenient to simply have identical commit SHA's than to keep a mapping of the commits between the Git servers. This is because the manual mirror is more efficient (saving computation time and server bandwidth) if I can skip certain commits from the source server. Yet that means the parent commits change in the target server, with respect to the same commit in the source server. In turn, that would imply the SHA changes, which would require me to keep track of a mapping of the sha's in the source and target server. In short, it would be more convenient to simply override the sha's of the commits to the target server, than to ensure the two servers have the exact same commits (for the few commits that are actually mirrored).

a.t.
  • 2,002
  • 3
  • 26
  • 66
  • 1
    Specifying a commit sha is not intended. I assume you are looking for somethin like a `git tag` – Jonathan Weine Dec 10 '21 at 14:27
  • 5
    It is by design something that can't be chosen, but is determined for you. _Why_ do you want this? Maybe you could [craft a specific commit message](https://stackoverflow.com/questions/9733757/maximum-commit-message-size) that would yield the same hash. – CodeCaster Dec 10 '21 at 14:27
  • 1
    How git is designed does not allow you to choose the sha except for one way. You can carefully construct the *contents* of the commit to produce that sha. **However**, that is exceedingly difficult and costly. By exceedingly, if you are here asking how to do it, you basically don't know how to do it so you can't. This is research-level difficulty. As far as I know, there's been only a handful successful attempts at creating hash collisions where they construct two distinct, different commits, with the same sha. Starting from scratch to arrive at a sha I don't think has been done yet. – Lasse V. Karlsen Dec 10 '21 at 14:41
  • 1
    So just to simplify my comment for you: Git does not work this way, **you can't do it**, so you can just forget about that convenience and come up with a different way to handle your situation. – Lasse V. Karlsen Dec 10 '21 at 14:43

4 Answers4

2

A commit SHA isn't just "normally" generated based on those parameters, it is by definition a hash of those parameters. "SHA" is the name of the hashing algorithm used to generate it.

Rather than trying to change the commit hashes, you should look for an efficient way to track them. One approach would be similar to how plugins like git svn work:

  • When copying a commit to the mirror, record the original commit hash as part of the new commit's commit message.
  • Possibly, since you're "skipping" commits in the original repo, each new commit should have multiple source hashes, since it will act like a "squash" of those commits.
  • Have a script which processes the result of git log and extracts these recorded commit hashes. This can then be used instead of the real commit hashes when determining what new commits to copy from the source.

However, make sure this is all worth it: if the eventual changes are all included, the chances are that git's existing de-duplication and compression will mean the overhead of the "skipped" commits is fairly low.

IMSoP
  • 89,526
  • 13
  • 117
  • 169
  • Thank you, I am often humbled by the creativity that is displayed in approaching problems! It is simple yet (imo) elegant to write the sha of the `source` commit as a commit message in the `target` commit. That prevents me from having to keep a separate bookkeeping, yet still stores the information (in the `target` server). – a.t. Dec 10 '21 at 14:52
1

From the comment by CodeCaster, it seems I could use the freely choosable bits in the commit message in `git commit -m "some message" to ensure the sha of the commit ends up with a specific value.

However, based on the comment by Lasse V. Karlsen I would assume this approach requires non-linear computation resources. I did not go into detail in this, however I imagine/assume that as the commit history grows, the relative impact of the (limited (5mb) ) freely choosable bits of the commit message becomes smaller. I guess that could be an explanation on why leveraging these freely choosable bits in the commit message becomes costly.

So in practice, the answer seems to be: "You could (perhaps, if you spend a lot of computational resources), but you shouldn't.".

a.t.
  • 2,002
  • 3
  • 26
  • 66
1

Since you've already outlined in your question that you have ways of handling your differences, I will assume this question is really and only this:

I would like to know: how can one specify a custom/particular/specific git commit sha (in Bash)?

And not "or do you have any other ideas that I could use instead".

And with that question, the answer is actually quite simple:

You can't.


Git doesn't just calculate the commit id because that's just a by-product of the implementation chosen. The way it is done is a core concept of how git is designed.

The commit id is calculated based upon the content of the commit, and this includes, as you have observed, the link to the parent. Change the parent but keep everything else identical, the commit id still changes.

This is core to how the distributed part of the version control system works, and cannot be changed.

You simply cannot change the id of a commit and keep the contents of it the same. This is by design

There has been some attempts at doing commit collisions by carefully constructing distinct commits that end up having the same id.

Here's such a successful attempt (collision): https://www.theregister.com/2017/02/23/google_first_sha1_collision/

First ever' SHA-1 hash collision calculated. All it took were five clever brains... and 6,610 years of processor time

I don't believe anyone yet have managed to take an arbitrary commit and then targeting a specific commit id with it. The collisions were carefully constructed by manipulating two commits simultaneously according to very specific criteria such that they arrived at the same id, but that id was not chosen by the researches.

TL;DR: It can't be done

The net effect of the collision(s) generated though is that Git will move away from SHA-1 at some point and go for a system that produces longer, and "more secure" (tm) hashes than what we have today. Since Git also wants to be backwards compatible with existing repositories, this work is not yet fully completed.

Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825
  • To intentionally do this you would need to be able to do "second preimage attack" against SHA-1. Complexity of such attack is about O(2^159) if I've understood correctly. Or totally infeasible in practice because the total energy that Sun produces during its 10 billion year lifetime wouldn't be enough for the theorical minimum energy needed to brute force such complexity. – Mikko Rantalainen Mar 20 '23 at 13:21
0

how can one specify a custom/particular/specific git commit sha (in Bash)?

One cannot. The commit hash is a value constructed, as you say, by hashing various values together, and the whole point is to uniquely identify a particular commit. You could commit the same set of files at a different time on a different machine and you'd end up with a different commit hash.

The way to ensure that you have the same commits on two different machines is to git pull (or similar) those commits from one machine to the other. You don't necessarily have to move all the commits -- you could e.g. squash them or cherry-pick only certain commits.

Caleb
  • 124,013
  • 19
  • 183
  • 272