12

For the sake of an experiment, lets say your git log identifies the following commits

commit 16bc8486fb34cf9a6faf0f7df606ae72ad9ea438  // added 2nd file
commit 9188f9a25b045f130b08888bc3f638099fa7f212  // initial commit

After committing, .git/refs/heads/master points to 16bc8486fb34cf9a6faf0f7df606ae72ad9ea438.

Let's say, after this, i manually edit the .git/refs/heads/master file to point to 9188f9a25b045f130b08888bc3f638099fa7f212

At this point, git status recognizes that a new uncommitted file is in need of some attention. This is the same file my second commit took care of before.

If i do commit it .. git log now shows

commit b317f67686f9e6ab1eaabf47073b401d677205d5  // 2nd file committed for the 2nd time
commit 9188f9a25b045f130b08888bc3f638099fa7f212  // initial commit

Question 1:

You'll notice that SHA hashes are different between the very first time i committed a second file and now. Why is that? File's content did not change, it is still the same exact file.

Question 2

At this point, what happened to the original second commit? When i do git show 16bc8486, it shows this commit. It does not however show up in git log history.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
James Raitsev
  • 92,517
  • 154
  • 335
  • 470

3 Answers3

15

Question 1: Because the hash is generated taking everything into account including the commit meta data (which itself contains the date and time).

Question 2: git log shows the log of the current branch. The commit 16bc8486 is not part of it. As far as I know (I'm not completely sure) the garbage collector will take it away sooner or later, if it finds it with nothing referencing it (git gc --help)..

Community
  • 1
  • 1
KingCrunch
  • 128,817
  • 21
  • 151
  • 173
  • On Q2, `git branch` only shows one branch in existence at this point - *master. What branch is the old file now a part of? – James Raitsev Jan 15 '12 at 20:41
  • @JAM: it's not part of any branch, which is why it is candidate for garbage collection. You can "rescue" it by creating a branch explicitly on that commit `git branch branch_name commit_hash`. – Mat Jan 15 '12 at 20:43
  • @Mat, if some time passed by and that hash is not readily available, is it possible to recover it somehow? – James Raitsev Jan 15 '12 at 20:49
  • @JAM: You should not expect it, because once the gc decides to clean up stuff, its away. If you don't have it somewhere else (a fork, or just a copy of the files), its away forever. If you want to keep it, create a branch pointing to it and thats it. It's not required, that you push the branch somewhere, You can keep it local. – KingCrunch Jan 15 '12 at 20:50
  • 5
    @JAM If it hasn't been garbage collected, yes - you can use `git fsck --unreachable` to find all such commits. If it's been garbage collected, time to test your backups :) – Mat Jan 15 '12 at 20:51
  • 1
    @JAM git does keep log files of how refs have changed over time- look at the `git reflog` command, and `git log -g`. The garbage collector takes the log files into account- it doesn't prune objects as long as they are referenced from the log files (but the log files themselves are automatically pruned after some amount of time) – araqnid Jan 16 '12 at 14:52
6

The sha1 values for each of the file blobs will be identical in both cases if you have the same content (even if the filename is changed).

Likewise the sha1 values for the trees of the files blobs will be the same if they have the same filenames.

However at the very top we have the commit which will contain the unchanged link to the previous commit, the top tree, the author and commiter, but as KingCrunch said, the author and commiter date will be different, so the sha1 of the commit sha1 will be different.

You can make them the same if you deliberately set the author and commiter date using the environment variables so they are unchanged.

Philip Oakley
  • 13,333
  • 9
  • 48
  • 71
  • Extra corollary; If one does make them identical then they will be identical as far as the object store and the branch graphs are concerned. It will be as if the initial, but identical split never happened - they are indistinguishable! Happy hunting. – Philip Oakley Jun 09 '14 at 19:37
2

The SHA1 is calculated from the diff and all meta data from this commit (including the author and committer, the timestamp, and various other data).

For your second question, the data commit is still present but not part of any live branch anymore. Sometimes git will run a garbage collection where various deleted stuff will actually be removed. You will notice that once you manually run it using git gc that the unreferenced commit will be gone and not even be accessible with git show anymore.

Holger Just
  • 52,918
  • 14
  • 115
  • 123