1

I have a list command from some source code:

git clone https://github.com/BVLC/caffe.git
cd caffe
git checkout 8c66fa5f3c04e -b unet_patch
git cherry-pick 458928a

I know that the first line will clone a recent version of Caffe code. However, the third and fourth lines will modify the source code. I do not know what 8c66fa5f3c04e and 458928a are. From these numbers, can I find the Git hash of the Caffe code?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
John
  • 2,838
  • 7
  • 36
  • 65
  • It's not at all clear to me what you want, but Git doesn't "hash the code" (whatever that means). Git's SHA-1 hashes are per *Git object* and there are four types of objects: *blob* (file), *tree* (information about a group of files and subtrees), *commit* (a saved snapshot consisting of various metadata including one top level tree hash ID), and *tag* (annotated tag; usually a name for a commit, plus more metadata). – torek Mar 10 '17 at 11:40
  • Thanks. It may has some wrong word in my question. Let has an example. If I has a git has as `4541f8900588a335f2d9387a5b03460deba68678` Then I can easily go to the tree of caffe as `https://github.com/BVLC/caffe/tree/4541f8900588a335f2d9387a5b03460deba68678`. which made one year ago. Hence, I would like to find the caffe's version which code based on. Let I update the full comment – John Mar 10 '17 at 11:48
  • echo "U-Net 3D merged to BVLC/caffe 8c66fa5f3c04e" | git hash-object -w --stdin . The output is `5394104f917634da84f0b4543847c02de7553ba8` and I want to see the original code/tree code. so I used https://github.com/BVLC/caffe/tree/5394104f917634da84f0b4543847c02de7553ba8 but nothing – John Mar 10 '17 at 12:03
  • `git hash-object` will compute a *new* hash for a *new* object. With `-w` it also writes the object into the database. The resulting hash is the hash of the *new* object (whose type is whatever you set with `-t `, but defaults to type `blob`). If that new object is bit-for-bit identical to some *existing* object, Git winds up re-using the original object, so if you do not write the new object into the database, you can use the resulting hash to see if the proposed object already exists. But that's not a good way to find what I think perhaps you are looking for. – torek Mar 10 '17 at 12:08
  • @torek: Right. I want to find the tree version of above code. I guess it based on versions that published 1 year ago – John Mar 10 '17 at 12:11
  • Aha, you didn't want to find a whole *tree* of *many files*, you wanted to find a commit containing a single *specific* file. That's much easier, but is a duplicate of numerous existing questions. – torek Mar 10 '17 at 13:26
  • Thank for your help. Based on the suggestion of Mauro Piccotti. I find the parent of the code. Could you verify my steps? First, I used the id 8c66fa5f3c04e from the third command git checkout 8c66fa5f3c04e -b unet_patch and command as git cat-file -p 8c66fa5f3c04e to find the parent. Is it right? – John Mar 10 '17 at 13:29
  • You don't need to check it out (neither as detached HEAD nor as a new branch) to *look* at it, though that may be convenient. Using `git cat-file -p`, which takes any valid revision specifier—see [`gitrevisions`](https://www.kernel.org/pub/software/scm/git/docs/gitrevisions.html)—will extract the object contents from the database. But I'm still not clear on your real question. If that was the answer you wanted, you don't care about *trees of files* but rather only about a *single* file. – torek Mar 10 '17 at 13:32
  • yes. I am not the family of git so it is difficult to say a correct word. just take an example. I clone a source code that is modified of caffe code and I want to know which is caffe's tree of the source code based on. that is what I need – John Mar 10 '17 at 14:14

3 Answers3

2

From a hash to your commit information:

git cat-file YOUR-HASH -p

From a string to an hash using Git:

echo "your string" | git hash-object -w --stdin
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Mauro Piccotti
  • 1,807
  • 2
  • 23
  • 34
2

I'm not going to edit your question since I cannot be sure that this is your intent, but it sounds like your real question comes down to this:

Suppose I have a directory full of source code. Suppose further that I believe this directory-of-source-code was created by running a series of git commands, such as:

git checkout [some hash ID]
git cherry-pick [another hash ID]

but Evil Spirits, or fluoridation, or some such, has lost the .git directory. So all I have now is this tree of code. Let's call this "my lost tree".

Meanwhile, on some other machine or in some other directory, I do have a full Git repository. I am curious as to what commit or commits (there may be more than one) would, if I ran git checkout <hash>, get me a work-tree that is identical to my lost tree.

Now, it's not clear what the point of all this is, but it is possible to do, with some caveats. The easy way to do it is to add the lost tree to the full Git repository—or, if you are concerned about precious bodily fluids :-) (see the "fluoridation" link above), to a git clone --mirror of it. (The clone is "as good as" the original, but can be thrown away after this process.)

Things to know before steps 1 and 2

In any normal Git repository, there are three things of interest while you are working on making a new commit:

  • the current commit, which is your HEAD;
  • the index, which is where you build the new commit; and
  • the work-tree, which is where you keep files in a form the rest of the computer can deal with, as the files stored in the current commit and in the index are in a form only Git can deal with.

As I mentioned in a comment, the repository database itself has four object types: blob (a file); tag (an annotated tag: a human-readable name for some commit object, plus some other metadata); commit (metadata including a log message, an author and committer, the parent commit(s) IDs of the commit—it's these parent IDs that produce the history, when commits are viewed according to parent/child relationships); and tree. A tree object contains the names and hash-IDs of blobs and of additional sub-trees, and hence could represent a tree of files identical to your lost tree. Each commit has, as part of its metadata, the hash ID of the commit's stored snapshot, i.e., the tree. So if your lost tree is in fact in the repository, its hash ID is stored in some commit(s).

We will make use of the fact that, once some object is in the repository, any attempt to put a bit-for-bit-identical new object into the repository simply re-uses the existing object. Making one grand assumption,1 this re-use is fundamentally safe: if the new object is bit-for-bit identical to the old object, why would you care which object—new or old—Git pulls out later when you ask Git to retrieve the object by its hash ID?


1The assumption is that no two different objects produce the same hash ID, ever. The pigeonhole principle tells us that this assumption is false in theory, but in practice, it's actually true. It is possible, but currently very expensive, to break the assumption deliberately. A longer hash ID can—again, in theory at least—be less-breakable, although cryptography is always getting weirder. :-)

In fact, if two different Git objects do produce the same hash, Git breaks ... well, sort of; it breaks, or should break, in a "safe" manner. The example PDFs that break SHA-1 do not actually break Git, though. Other files could—but meanwhile, some minor coding glitches in existing versions of Git apparently cause it to fail to alert the user to the fact that it will fail to store some new version in your repository.


Step 1: Finding the hash ID of your lost tree

The work-tree corresponds to your lost tree, but Git won't make a tree object out of a work-tree. Git will only make a tree object out of the index. This means that in order to find the hash ID of your lost tree, you must copy it into the index.

The index already has stuff in it, so your first step is to remove all of it. In the top level of your Git repository, tell Git to remove everything, both from the work-tree and from the index:

git rm -rf .

You should now have an empty work-tree ... unless—this is one of the caveats—there are untracked files.2 If there are some untracked files, you will have to find out (or guess) whether those are also in your lost tree, and whether they will also be untracked in the commit(s) in the full repository that use that same tree. I leave it to you to find solutions to this problem, should you actually have this problem. (It's possible that even if there are untracked files in the repository, they were and are not present in your lost tree.)

In any case, you probably want to discard any untracked files at this point. If there are any, you can use git clean -fdx to discard them. (This can be a good reason to work on a fresh mirror clone: it won't have any such untracked files in the first place, and removing such files from a real work-tree may force you to rebuild them later, which for a large project, might be many CPU-hours of computation.)

Now that your Git index and work-tree are empty, we will re-fill them from the lost tree:

(cd /path/to/lost/tree; tar cf - .) | tar xf -

or:

cp -R /path/to/lost/tree .

or whatever, so that the work-tree is now a copy of the lost tree.

(At this point, you must throw out, from the copy, any files that should be untracked. Since we removed everything, we also removed any .gitignore files that we had before, so files that would be ignored, if this were a normal setup, won't be, unless those .gitignore files are in your lost tree. Again, how you do this, if you need to do it at all, is up to you.)

Second-to-last, we now want to populate our index from this work-tree. This part is simple:

git add .

does the trick. We now have a full index and can produce a tree object and find its hash.

The "normal" way to do this is to make a new commit. If we make a new commit now, it will have as its parent, our current (HEAD) commit. It will be added to our current branch. There's nothing wrong with this, but that's not our goal at this point, so we can use a lower-level Git command, one of the so-called plumbing commands:

git write-tree

What this does is turn the index into a series of tree objects, one for each sub-directory (and that sub-directory's files) stored in the index, and one final top-level tree, for the files and sub-directories at the top level, i.e., for .. The output is the hash ID of the object just added to—or reused from—the existing Git object database:

$ git write-tree
b3bb4696cf8dcb93c1f09a447f6b7356bccb24d2

This tree hash is what we are looking for, but it's not a commit hash. We simply believe that there may be one, two, or many existing commits that have this tree as their hash.3


2An untracked file is simply one that is not in the index. This simple definition is not a problem for us unless and until it becomes a very big problem: if your lost tree contains untracked files, you don't know which ones are untracked because the index that made them untracked was part of the Git repository you lost when you acquired the lost tree in the first place!

3If we use git commit to make a new commit, the new commit we just made will have this hash as its tree object. That's not the commit we're looking for, of course—but if you use git commit instead of git write-tree, this is something to keep in mind.


Step 2: finding commits that have this tree

The remaining caveat, of course, is that it's quite possible that no commit has the tree your just made; or it may be in two or more commits. The latter occurs from time to time due to git revert or trivial merges (merges that could be, but on purpose are not, fast-forwards). The way to deal with this is to find all such commits, then decide which one(s) you want.

To find these commits, our first sub-step is to enumerate every commit in the repository. We need their hash IDs, so that we can use another Git plumbing command to find their tree ID (remember, each commit has exactly one tree). The command to find every commit or other object reachable from some name is git rev-list; the option to use all names is --all; so:

git rev-list --all

does the trick. This prints each hash ID to its standard output stream, so we will now collect all those IDs and turn them into their corresponding tree hashes.

One slight wrinkle here is in the phrasing above: this finds all commits or other objects, including annotated tag objects. An annotated tag is a name for another Git object, usually a commit object. So if we find that annotated tag v1.3 and commit 1234567... both name your lost tree, we'll see two hash IDs here. That's probably actually what we want, but if not, you now know what to look for to change this.

In any case, to turn the rev-list ID into a tree, we will want to use git rev-parse. It's possible that the ID cannot be turned into a tree: an annotated tag object, for instance, might tag a blob object rather than a commit. So for a fully robust solution we should check, using git rev-list --verify --quiet and checking its return value:

lookfor=...put in the hash ID you are searching for ...
git rev-list --all |
    while read hash; do
        tree=$(git rev-parse --verify --quiet ${hash}^{tree}) || continue
        if [ $tree = $lookfor ]; then
            echo "found: $hash (type $(git cat-file -t $hash)) names tree $lookfor"
        fi
    done

(the above is untested but it's too simple to be wrong).

If this finds any objects that refer to your lost tree, you now have the hashes (of commits and/or tags) for it.

If this finds no objects, that means that either you put in the wrong tree—see the caveat about untracked files—or the tree you have does not have a corresponding commit. That does not mean it never had one: perhaps your lost tree was part of an experimental branch that was deleted and had all its commits thrown out during garbage collection, for instance. It just means that no commit has that tree now, in your full repository (or its mirror-clone).

Community
  • 1
  • 1
torek
  • 448,244
  • 59
  • 642
  • 775
1

If you need hash codes then you should first run the git log command and then there you can see such a long alphanumeric string which is your hash code.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131