This is about the internals of git
.
I've been reading the great 'Pro Git' book and learning a little about how git is working internally (all about the SHA1, blobs, references, trees, commits, etc, etc). Pretty clever architecture, by the way.
So, to put into context, git references the content of a file as a SHA1 value, so it's able to know if a specific content has changed just comparing the hash values. But my question is specifically about how git checks that the content in the working tree has changed or not.
The naive approach will be thinking that each time you run a command as git status
or similar command, it will search through all the files on the working directory, calculating the SHA1 and comparing it with the one that has the last commit. But that seems very inefficient for big projects, as the Linux kernel.
Another idea could be to check last modification date on the file, but I think git is not storing that information (when you clone a repository, all the files have a new time)
I'm sure it's doing it in an efficient way (git is really fast), does anyone know how that is achieved?
PD: Just to add an interesting link about the git index, specifically stating that the index keeps information about files timestamps, even when the tree objects do not.