1

I have a repo with lots of commits and lots of Blobs. I want to migrate to LFS because it takes significant time to clone and fetch the repo; However, when running the git lfs migrate import command, the repo goes from ~40 GB to well over 200GB. I don't know the actual size because the command has been running for a whole day and is only 40% done. My question is, why is the storage so large when the files in the repo are only around 3GB with no git history? (might be helpful to mention that the git history alone to date is around 17GB)

I tried git lfs migrate import but this is less of a solution and more of a why

UPDATE: I am not looking at how to solve the storage problem locally or remotely. I want to understand why and how LFS is storing objects in a blob store to rack up so much storage space.

  • Is there a reason that you need to keep...so much history around? Usually people undergo some kind of [repo trimming ceremony](https://stackoverflow.com/q/2116778/1079354) before it clips 100 *megabytes*. – Makoto Nov 11 '22 at 22:39
  • 1
    Does this answer your question? [What does git lfs migrate do?](https://stackoverflow.com/questions/51782043/what-does-git-lfs-migrate-do) – Daniel Mann Nov 11 '22 at 22:40
  • @Makoto, we need it for audit reasons. So unfortunately, we cannot trim anything –  Nov 11 '22 at 22:45
  • @DanielMann no, not really, I understand that locally I can prune these things, but I want to understand why on a remote LFS server, it takes up so much. Is it making a new copy of each changed object like LFS is supposed to solve? Even if that is the case the git with the objects in the history is only 20GB while the git with LFS is HUGE –  Nov 11 '22 at 22:48
  • ...I think you have made a grave error in treating Git as an audit utility. But that's neither here nor there. – Makoto Nov 11 '22 at 23:01
  • @Makoto do you actually mind elaborating on that? Is there another way to keep track of who touched and changed files and when? That might actually solve a problem If you know a solution to this other than the commit history –  Nov 11 '22 at 23:28
  • I mean, if you ***really*** need all of that history, normally what people do is [compress it](https://stackoverflow.com/a/55515739/1079354) at periodic points when the size of the repo is Too Large™, for a given definition of that, as opposed to letting the repo continue to grow. I don't know what audits you do and what the expected lifetime is for maintaining these records, but in the use cases that Git has - mostly just versioning flat-files - this exercise isn't really undertaken all that often unless there are binaries in the repo. – Makoto Nov 11 '22 at 23:37
  • @Dalton Are these large files you're trying to extract **text**? Because Git is actually very good about compressing and storing deltas for text. If you're now trying to take massive *text* files and put them into an uncompressed blob store, it would make perfect sense that they'd use a ton more space in the LFS store than they did in the repo. – Daniel Mann Nov 12 '22 at 00:42
  • Consider using shallow clones if you do not want to have the full thing in your local while keeping the whole thing in a remote repo. – eftshift0 Nov 12 '22 at 11:23

1 Answers1

2

When you use Git to store any object, it stores the data deltified which stores most of the objects as references to other objects. It then compresses the objects. This is great if your objects are text files, but with many large files, they are already compressed (images or textures) and therefore the deltification step is slow and ineffective and compression actually expands the data.

With Git LFS, the large files are stored outside of the repository, and they are neither compressed nor deltified. Again, that's because for a lot of large files, those simply waste CPU and are not effective. Once you push all of those large files to the server, you can run git lfs prune, and at that point, only the large files needed for a checkout will be maintained or downloaded. Thus, if you only need 500 MB of large files in your checkout, that checkout will only download that 500 MB, and the rest will be stored on the server.

Over time, you can accumulate a decent number of large files on your local system and you may need to run git lfs prune to remove the ones that are no longer needed.

bk2204
  • 64,793
  • 6
  • 84
  • 100