Slow Git operations

Question

I've got a test repository that I put under Git. Most of the files are pretty tiny but there's a very large number of them and simply Git operations like add and status are taking tens of minutes to complete. What are my options for putting these under revision control and getting reasonable performance? Should I attempt to use submodules or should I steer clear of DVCSes?

Git is known to be capable of handling large projects fast. Are you using a slow file system? — Ferdinand Beyer, Mar 12 '12 at 15:28
Updated my answer based on NFS information. I think that was the crucial piece of information here. If possible, I would consider having local clone of that, but if not, then see the thread and configuration options. — eis, Mar 13 '12 at 15:36
I also experience very slow `git reset` operation on a local SSD file system. The repo is rather large – Cocoapods local spec. — adib, Nov 23 '16 at 09:04

score 14 · Accepted Answer · answered Mar 12 '12 at 18:53

Git operations like add and status require stating every file in the filesystem (to detect changes). Either you have a truly massive number of files (say, tens or hundreds of thousands of files), or you have a filesystem that has a rather slow stat operation.

In any case, if you need to work on a system where this is extremely slow, you can use the "assume unchanged" bit in the index, which tells Git not to bother stating the file. If you do turn this on, you need to manually instruct git to pick up changes in individual files, e.g. by passing them directly to git add, otherwise Git won't even know anything changed. You can turn this on by setting git config core.ignoreStat true and then running something like git reset --hard HEAD.

Bingo! I haven't counted them up because I'm afraid to but I wouldn't be surprised to discover that it's hundreds of thousands or even millions of files, nearly all human generated. I tried setting that flag and it helped a bit on some operations but still too slow. Maybe I should create tons of small repositories instead. — dromodel, Mar 12 '12 at 23:30

score 7 · Answer 2 · edited May 23 '17 at 10:31

7

I wonder what is "very large" number here. Usually it's not the amount of small files that git finds troublesome but the big binary files. However, I can imagine that if the amount is large enough, you'd want to have them splitted into several repositories - either by means of submodules or some other way. If they need to reside in one single repo, you might find for example Subversion to be more performant.

EDIT: ok, so you added as comment that you use a NFS mount, which would sound like the possible bottleneck here. Please check for solutions on that in this thread. Especially core.preloadindex might be of interest here.

From the documentation:

core.preloadindex

Enable parallel index preload for operations like git diff

This can speed up operations like git diff and git status especially on filesystems like NFS that have weak caching semantics and thus relatively high IO latencies. With this set to true, git will do the index comparison to the filesystem data in parallel, allowing overlapping IO’s.

EDIT2: on the comments there was a mention of 6 million files. I can understand this becoming a bottleneck - that is indeed very large amount.

edited May 23 '17 at 10:31

Community

1
1

answered Mar 12 '12 at 18:44

eis

51,991
13
150
199

I somehow doubt SVN is more performant than git - and even if it is, git is much better (according to Linus Torvalds you are ugly and stupid when not using Git :p) – ThiefMaster Mar 12 '12 at 18:47
Well, you don't have to take just my word for it - even Linus [agrees](http://stackoverflow.com/questions/984707/what-are-the-git-limits) that for some use cases this is the situation. Git operates on the repo as a whole so it isn't the best option in some scenarios. – eis Mar 12 '12 at 18:52
There are few binary files. The number of files is substantially larger than the number you would find in any single open source project. – dromodel Mar 12 '12 at 23:31
What we have at work is about 50k files, which hasn't been an issue. If however you have something like hundreds of thousands or even millions, I can see that becoming a bottleneck. I would be very interested to find out any numbers how would svn or perforce handle that... if you do test them and you have figures somewhere, pls let us know too :) – eis Mar 13 '12 at 15:22
There are about 6 million files. A git status operation took about 2 hours to complete, with core.preloadindex set to true. – dromodel Mar 13 '12 at 20:33
Ok. :D Was wrong then. Thank you for that information, anyway. – eis Mar 13 '12 at 22:39

Slow Git operations

2 Answers2

Linked