creating git branch is extremely slow in large repository

Question

i have a local repository with ~300.000 files and about 40gb on an encrypted filesystem (and i cannot change that ...). i often need to create a new branch and make the current contents of the working directory the contents of this branch.

so this "checkout" is not actually a checkout that modifies anything in the working tree, but just creates a branch, switches to it, and leaves the working directory unchanged. and it is not about large files: the average file size is much less than 1mb (40gb/300000=130kb)

currently i do:

git checkout -q -b mynewbranch
git add -v -A
git commit -q -m "at mynewbranch"

in principle this works, but the first step to create the branch takes more than an hour (!). (the "add" and "commit" take a few minutes, i could live with that.) the "git checkout" seems to re-read the whole working directory just in order to create the branch.

ideally i would want that creating the branch would take almost no time at all, and its state should simply be based on a previously existing branch. and then the "add" should also not take too much time since timestamps may be used and not all file contents should be compared to the repository, only files with new timestamps should be looked at in detail.

has anybody an idea how this can be done efficiently ?

edit: git 2.17, ubuntu, encfs over ext4, recent hardware, 12 cpu, mostly binary files (like pdf, jpeg, mp4; no deep tree; they need to be versioned).

the primary issue is: can it be avoided that just creating a branch looks at the content of all files ?

What OS, what filesystem do you use? What are your storage drive hardware characteristics? What are the file characteristics? Is it source code (like small text files in a deep tree) or something else? Do you have any other software running in background besides git that might process the files? — battlmonstr, May 19 '18 at 14:39
See also: https://stackoverflow.com/questions/3313908/git-is-really-slow-for-100-000-objects-any-fixes — battlmonstr, May 19 '18 at 14:44
If you're in Windows, there are a number of things published by Microsoft (who work with a 500GB windows working dir) can help you. That includes making sure you're on the latest git version. https://blogs.msdn.microsoft.com/devops/2018/01/11/microsofts-performance-contributions-to-git-in-2017/ — jessehouwing, May 19 '18 at 15:20
COnvertign to LFS may also improve perf if you have many binary files in the repo. — jessehouwing, May 19 '18 at 15:26
Based on your updates, enabling Git-LFS should really help lots. Git isn't ideal for large binary files, as you may have noticed. — jessehouwing, May 20 '18 at 08:37
i dont think LFS would help anything. the repository and working directory are both local, and all the files are already in the working directory. there ist no advantage in something like lazy downloading. — Marit Lendox, May 21 '18 at 06:43
Why don't you try it? This kind of situation is the kind of thing that LFS is designed to help with. — Robin Green, May 21 '18 at 07:45

Robin Green · Answer 1 · 2018-05-21T07:46:01.717

2

git is not designed to work with large repositories (although, Microsoft has recently worked on extending it to support them - see comment on the question above). I suggest you split up your repository into multiple repositories, and/or use LFS. If you use LFS, you will probably want to use BFG Repo Cleaner to efficiently recreate the repository without all the large files in the history - unless the repository consists of solely large files.

LFS does support versioning:

Large file versioning

Version large files—even those as large as a couple GB in size—with Git.

edited May 21 '18 at 07:46

answered May 19 '18 at 16:59

Robin Green

32,079
16
104
187

splitting up would most likely not be of any help. it would still re-read the whole number of the files, just distributed among multiple repositories. 100x 1 minute ist still 1.5 hours. my point is that reading all the file contents in order just to duplicate a branch is completely useless; the challenge is, which git commands or settings make git not do this time-consuming nonsense. – Marit Lendox May 21 '18 at 06:51

creating git branch is extremely slow in large repository

1 Answers1