4

I know git would get slow when the repo get bigger.
But why?
As git stores files as separate directories and files under .git, i can not find out why the operations get slowwer. Let's have a look at the commit operation. Recently, i cloned the webkit repo and i branch from master, then i commit a 2k file to the branch. But i feel it gets slowwer than that i do on my small repo.
Because i have not read through git source code, i guess the commit operation compromises storing the file to the disk, inserting the commit log, updating the index and updating the HEAD to the sha value of the file.

The write is fast.
The inserting is fast.(I guess, if the insert action is append log to a file)
The updating index is fast.
The updating HEAD is fast.

And why it is slow? Can anyone illustrate it to me?
Thanks.

Some answers is helpful but not very convincible, it will be great to provide some code snippets to support you.

Mr. C
  • 548
  • 5
  • 17

1 Answers1

5

Committing a tree should be constant in time, since it only needs to create a new commit object (git write-tree) and update the HEAD ref.

I did benchmarks of different SCMs in the past and git commit was indeed not affected by tree size, repository size, history length, etc.

knittl
  • 246,190
  • 53
  • 318
  • 364
  • Since the commit ID is the SHA-1 of the current repo snapshot, it cannot be constant-time. – Tordek Jul 09 '13 at 08:54
  • 3
    The commit ID is the SHA1 of the commit objects content. The content includes the parent commit(s) SHA1 hashes, the tree's hash (generated during `git add`) and the commit message text. Unless you have an insanely long commit message, it does not matter. – knittl Jul 09 '13 at 08:58
  • I see. Buy you're still hashing the whole tree during the `add` operation. Unless the original answer is explicitly avoiding this step, it should still be relevant. – Tordek Jul 09 '13 at 09:06
  • 1
    @Tordek: well, you are hashing the newly created tree of the current directory and then you need new tree objects for all parent directories up to root. But it does not affect commit times, only `git add` – knittl Jul 09 '13 at 09:13
  • yeah, that is the answer. I use *git commit -am* to commit a file. Then i use *git commit -m* to test and *time -p* to record usage. I get 0.6 less. – Mr. C Jul 09 '13 at 10:03
  • Note that hashing a tree does _not_ involve hashing all files it contains. It only hashes the files' hashes (these are also generated during `git add`). Still, it does not depend on history size of the repository. Large trees and deep directory hierarchy might impact performance of `git add` though. – knittl Jul 09 '13 at 10:26
  • I note that this answer doesn't actually answer the question: it simply denies there's a problem in the first place. I'm also experiencing very slow commits to some particular local repos and even after running `git gc` it's still slow and I have no idea why (there's no git hooks or anything either). – Dai Apr 19 '20 at 22:40
  • @Dai is your repository public? Can you create a MWE which exhibits the problem and share it? Git developers would be very interested (and I could update my 7-year old answer with more details :)) – knittl Apr 20 '20 at 05:05
  • @knittl It's a private repo, sorry. Even if it was a public repo it's only reproducible on my machine (on my laptop commits to that repo are faster - but still not faster than a brand new empty repo). I don't think it's an IO-bound problem either (because my machine has Intel's latest-and-greatest Optane SSD). UPDATE: I realised I'm running an ancient version of `git` (2.17) - I'll update and see if that helps. – Dai Apr 20 '20 at 05:22
  • Also, note that the question was about "Git becoming slower the bigger a repository gets" not "Git is slow when committing". There's a huge semantic difference. If it is _just_ slow, that could be anything (still not good, but at least it's consistently slow ;)) – knittl Apr 20 '20 at 11:11