79

Github has a limit on push large file. So if you want to push a large file to your repo, you have to use Git LFS.

I know it's a bad idea to add binary file in git repo. But if I am using gitlab on my server and there is no limit of file size in a repo, and I don't care the repo size to be super large on my server. In this condition, what's the advantage of git lfs?git clone or git checkout will be faster?

Sanster
  • 1,068
  • 1
  • 9
  • 12
  • Have you compared the connection speed? – SOFe Feb 23 '16 at 10:52
  • 1
    No. I am trying to figure it out in principle. – Sanster Feb 23 '16 at 11:55
  • 3
    With git-lfs, clone will be MUCH quicker. Checkout a little longer, the time to download the files put in lfs. But if you REALLY need to checkin some binaries, lfs is the way to do. – Philippe Feb 23 '16 at 18:17
  • https://www.atlassian.com/git/tutorials/git-lfs – Benny Sep 08 '17 at 19:09
  • 2
    Should clearly distinguish the use case if the large files are modified (heavily) or just static assets in the repo. In case that a large file is just added once, then never modified there is no use of LFS. In case the large files are modified, then the accepted answer apply – g.pickardou Feb 05 '19 at 16:46
  • See also my Q: [How does git LFS track and store binary data more efficiently than git?](https://stackoverflow.com/q/75946411/4561887), and this other Q: [Do I need Git LFS for local repos?](https://stackoverflow.com/q/63864442/4561887). – Gabriel Staples Jun 27 '23 at 16:11

1 Answers1

139

One specificity of Git (and other distributed systems) compared to centralized systems is that each repository contains the whole history of the project. Suppose you create a 100 MB file, modify it 100 times in a way that doesn't compress well. You'll end up with a 10 GB repository. This means that each clone will download 10 GB of data, eat 10 GB of disk space on each machine on which you're making a clone. What's even more frustrating: you'd still have to download these 10 GB of data even if you git rm the big files.

Putting big files in a separate system like git-lfs allow you to store only pointers to each version of the file in the repository, hence each clone will only download a tiny piece of data for each revision. The checkout will download only the version you are using, i.e. 100 MB in the example above. As a result, you would be using disk space on the server, but saving a lot of bandwidth and disk space on the client.

In addition to this, the algorithm used by git gc (internally, git repack) does not always work well with big files. Recent versions of Git made progress in this area and it should work reasonably well, but using a big repository with big files in it may eventually get you in trouble (like not having enough RAM to repack your repository).

Bernardo Sulzbach
  • 1,293
  • 10
  • 26
Matthieu Moy
  • 15,151
  • 5
  • 38
  • 65
  • 5
    I always spouted on about it slowing down the repo over time, but this is a great concrete example! Thanks for showing how the size compounds as well as the resource consumption! – CTS_AE May 03 '19 at 00:44
  • 9
    So, using LFS is only good if you modify those large files frequently? What if I want to keep some software packages in the repo that I use but never modify.? – sanjivgupta Apr 12 '20 at 03:11
  • 3
    @sanjivgupta In that scenario LFS will have very few benefits. By having you follow the gitlfs process, you would mark the files as binary; then if the file is accessed with `git diff` it will prevent it from potentially crashing because of a large file. Additionally, if you do decide to update one of those packages in the future, you will reap the intended benefits of lfs by cloning only the latest versions for the branch from which you are cloning. All that being said, you should use a package manager for that scenario whenever possible. – Mark Clark Apr 22 '20 at 21:46
  • Hi. I am using GIT LFS but every time I make a change in scene and commit it, it gets saved whole on the GIT LFS (250mb) and therefore I cant work like this. I dont understand what advantage it is to use GIT LFS when the unity scenes are saved as a whole and in bigger teams, that would require terabytes of GITLAB storage. – Adam Beňko Dec 23 '21 at 10:42
  • 1
    Can you clarify: does `git lfs` still store a separate copy of each pointed-to version of the binary file? Or, does it somehow store only *changes* of the binary file in order to save storage space on the `git lfs` server, or perhaps even it stores only the *latest* copy of the binary file, and older versions are lost? I'm trying to understand the storage benefits of binary files on the `git lfs` server, if any. – Gabriel Staples Nov 21 '22 at 19:24
  • I enabled LFS, added a `.7z` format to `.gitattributes` file, and uploaded it. I had a 2 GiB compressed file. When I uploaded it, `git lfs` classified it as 20 GB. – Oo'- Feb 27 '23 at 07:18
  • 1
    @GabrielStaples git lfs stores the complete file as it is (without additional compression) and does not use any diff functions (see [this thread](https://stackoverflow.com/a/68623227/10270360) for a filetype comparison). – Alexander Gogl Jun 27 '23 at 13:16
  • For anyone wondering, in 3 years of daily use of `git lfs` on exceptionally large repositories, [I have come to hate it.](https://stackoverflow.com/q/75946411/4561887). Among other things I explain there, `git checkout` regularly takes several _hours_, instead of seconds. – Gabriel Staples Jun 27 '23 at 15:44
  • I highly recommend people read [@AlexanderGogl's answer here](https://stackoverflow.com/a/68623227/4561887) which shows how well regular `git` works on binary files these days, especially after running `git gc` to clean up and compress the repo. After [3 years on `git lfs`, and my frustrations with it](https://stackoverflow.com/q/75946411/4561887), I really think we should use regular `git` for large binary files. I still have some mixed feelings on it, though. – Gabriel Staples Jun 27 '23 at 16:06