This question was asked in various forms on SO and elsewhere, but no answer I was able to find has satisfied me, because none list the problematic/non problematic actions/commands, and none give a through explanation of the technical reason for the speed hit.
For instance:
- Why can't Git handle large files and large repos
- Why git operations becomes slow when repo gets bigger
- Git is really slow for 100,000 objects. Any fixes?
So, I am forced to ask again:
- Of the basic git actions (commit, push, pull, add, fetch, branch, merge, checkout), which actions become slower when repos become larger (NOTICE: repos, not files for this question)
And,
- Why each action depends on repo size (or doesn't)?
I don't care right now about how to fix that. I only care about which actions' performance gets hit, and the reasoning according to current git architecture.
Edit for clarification:
It is obvious that git clone
for instance, would be o(n) the size of the repo.
However it is not clear to me that git pull
would be the same, because it is theoretically possible to only look at differences.
Git does some non trivial stuff behind the scenes, and I am not sure when and which.
Edit2:
I found this article, stating
If you have large, undiffable files in your repo such as binaries, you will keep a full copy of that file in your repo every time you commit a change to the file. If many versions of these files exist in your repo, they will dramatically increase the time to checkout, branch, fetch, and clone your code.
I don't see why branching should take more than O(1) time, and I am also not sure the list is full. (for example, what about pulling?)