2

Update: I mixed two things together: the first line actually points to the directory (as very well explained in the answers below), and the second line points to parent commit. I let the second misled me. Thank you all for your clarifications.


I am learning git, and I was copying the code she does. I was in the subsection called Make a commit that is not the first commit. After she committed, this is what she got (I crossed out her email for formatting reason):

tree ce72afb5ff229a39f6cce47b00d1b0ed60fe3556

parent 774b54a193d6cfdd081e581a007d2e11f784b9fe

author Mary Rose Cook ...

committer Mary Rose Cook ...

a2

And, mysteriously, I got (I crossed out my personal info)

tree ce72afb5ff229a39f6cce47b00d1b0ed60fe3556

parent c96fbf6143ccef645d1cb867b05427c399a9bcb3

author ....

committer ...

a2

Comparing the two hashes for the current tree, it is clear that we both get the same hash (I do follow her code exactly).

So I am very curious about how this is possible? I know git supposedly hashes the snapshot of that moment, but my megadatas are surely not the same as hers. May someone knows what happens?

Flowing Cloud
  • 433
  • 1
  • 4
  • 8
  • The title and body of your question don't match. Are you referring to commit hashes or tree hashes? – jub0bs May 22 '16 at 07:29
  • 2
    Her tree and your tree have the same hash because the contents are the same. The article you link to explains it: *This blob file contains the compressed content of data/letter.txt. **Its name is derived by hashing its content.** Hashing a piece of text means running a program on it that turns it into a smaller piece of text that uniquely identifies the original. For example, Git hashes `a` to `2e65efe2a145dda7ee51d1741299f848e5bf752e`.* – jub0bs May 22 '16 at 07:30
  • @Jubobs Thank you. I now knew what I was wrong. – Flowing Cloud May 22 '16 at 07:42

1 Answers1

7

A tree hash is a hash of the current working directory - in other words, any two directories with the same files and directories inside them will have the same tree hash.

A tree is a hierarchical collection of files and directories, not tied to any particular point in history. For example, if you create a file and then later delete the file (with no other intervening commits), you will end up with the same tree you started with.

A commit is a point in the history of your project. A commit specifies a tree, but also contains other information such as author/committer and time, a commit message (in which the author describes what changed), and most importantly zero or more parents, which are the previous state of the repository. (Your very first commit has zero parents. Most commits after that have one parent during linear development, and more than one if you merge.)

(Source)

Once you make some alterations and commit them, you will have a commit hash. The chances of a hash collision in a 40 character SHA1 hash is very, very, very miniscule.

An SHA-1 hash is a 40 hex character string... that's 4 bits per character times 40... 160 bits. Now we know 10 bits is approximately 1000 (10^24 to be exact) meaning that there are 1 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 different SHA-1 hashes... 10^48.

What is this equivalent of? Well the Moon is made up of about 10^47 atoms. So if we have 10 Moons... and you randomly pick one atom on one of these moons... and then go ahead and pick a random atom on them again... then the likelihood that you'll pick the same atom twice, is the likelihood that two git commits will have the same SHA-1 hash.

(Source)

You could also have two or more different commits off the same parent commit (parent hash), and this is fine too, but every commit hash will vary. Unless you're surprisingly, mind blowingly (un)lucky.

Community
  • 1
  • 1
Ehryk
  • 1,930
  • 2
  • 27
  • 47