I've got a test repository that I put under Git. Most of the files are pretty tiny but there's a very large number of them and simply Git operations like add and status are taking tens of minutes to complete. What are my options for putting these under revision control and getting reasonable performance? Should I attempt to use submodules or should I steer clear of DVCSes?
-
What sort of file system are you working on? – Useless Mar 12 '12 at 15:26
-
3Git is known to be capable of handling large projects fast. Are you using a slow file system? – Ferdinand Beyer Mar 12 '12 at 15:28
-
1The mount is over NFS, though the head is pretty high-end. – dromodel Mar 12 '12 at 20:33
-
Updated my answer based on NFS information. I think that was the crucial piece of information here. If possible, I would consider having local clone of that, but if not, then see the thread and configuration options. – eis Mar 13 '12 at 15:36
-
I also experience very slow `git reset` operation on a local SSD file system. The repo is rather large – Cocoapods local spec. – adib Nov 23 '16 at 09:04
2 Answers
Git operations like add
and status
require stat
ing every file in the filesystem (to detect changes). Either you have a truly massive number of files (say, tens or hundreds of thousands of files), or you have a filesystem that has a rather slow stat
operation.
In any case, if you need to work on a system where this is extremely slow, you can use the "assume unchanged" bit in the index, which tells Git not to bother stat
ing the file. If you do turn this on, you need to manually instruct git to pick up changes in individual files, e.g. by passing them directly to git add
, otherwise Git won't even know anything changed. You can turn this on by setting git config core.ignoreStat true
and then running something like git reset --hard HEAD
.

- 182,031
- 33
- 381
- 347
-
Bingo! I haven't counted them up because I'm afraid to but I wouldn't be surprised to discover that it's hundreds of thousands or even millions of files, nearly all human generated. I tried setting that flag and it helped a bit on some operations but still too slow. Maybe I should create tons of small repositories instead. – dromodel Mar 12 '12 at 23:30
I wonder what is "very large" number here. Usually it's not the amount of small files that git finds troublesome but the big binary files. However, I can imagine that if the amount is large enough, you'd want to have them splitted into several repositories - either by means of submodules or some other way. If they need to reside in one single repo, you might find for example Subversion to be more performant.
EDIT: ok, so you added as comment that you use a NFS mount, which would sound like the possible bottleneck here. Please check for solutions on that in this thread. Especially core.preloadindex might be of interest here.
From the documentation:
core.preloadindex
Enable parallel index preload for operations like git diff
This can speed up operations like git diff and git status especially on filesystems like NFS that have weak caching semantics and thus relatively high IO latencies. With this set to true, git will do the index comparison to the filesystem data in parallel, allowing overlapping IO’s.
EDIT2: on the comments there was a mention of 6 million files. I can understand this becoming a bottleneck - that is indeed very large amount.
-
I somehow doubt SVN is more performant than git - and even if it is, git is much better (according to Linus Torvalds you are ugly and stupid when not using Git :p) – ThiefMaster Mar 12 '12 at 18:47
-
Well, you don't have to take just my word for it - even Linus [agrees](http://stackoverflow.com/questions/984707/what-are-the-git-limits) that for some use cases this is the situation. Git operates on the repo as a whole so it isn't the best option in some scenarios. – eis Mar 12 '12 at 18:52
-
There are few binary files. The number of files is substantially larger than the number you would find in any single open source project. – dromodel Mar 12 '12 at 23:31
-
What we have at work is about 50k files, which hasn't been an issue. If however you have something like hundreds of thousands or even millions, I can see that becoming a bottleneck. I would be very interested to find out any numbers how would svn or perforce handle that... if you do test them and you have figures somewhere, pls let us know too :) – eis Mar 13 '12 at 15:22
-
There are about 6 million files. A git status operation took about 2 hours to complete, with core.preloadindex set to true. – dromodel Mar 13 '12 at 20:33
-