It seems that the git team has been working on large binary file handling features that don't require git LFS - features like partial clone, and sparse checkout. That's great.
The one thing I'm not totally clear about is how these features are supposed to improve this issue:
Correct me if I'm wrong, but every time you run git status
, git quickly does a checksum of all the files in your working directory, and compares that to the stored checksums in HEAD
to see which files changed. This works great for text files, and is so common, and so fast an operation that many shells build the current branch, and whether or not your current working directory is clean into the shell prompt:
With large files however, doing a checksum can take multiple seconds, or even minutes. That means every time you type git status
, or in a fancy shell with a custom, git-enabled prompt hit "enter", it can take several seconds to checksum the large files in your working directory to figure out if they've changed. That means that either your git status
command will take several seconds/minutes to return, or worse, EVERY command will take several seconds/minutes to return while your current working directory is in the git repo, as the shell itself will try to figure out the repo's current status to show you the proper prompt.
This isn't theoretical - I've seen this happen with git LFS. If I have a large, modified file in my working directory, working in that git repo becomes a colossal pain. git status
takes forever to return, and with a custom shell, every single command you type takes forever to return as the shell tries to generate your prompt text.
Is this meant to be addressed by sparse checkout, where you just don't checkout the large files? Or is there something else meant to address this?