14

I am currently using git for a large repository (around 12 GB, each branch having a size of 3 GB). This repository contains lots of binary files (audio and images).

The problem is that clone and pull can take lots of time. Specially the "Resolving deltas" step can be very very long.

What is the best way to solve this kind of problem?

I tried to remove delta compression, as it it explain here using the delta option in .gitattributes but it seems to not improve the clone duration.

Thanks in advance

Kevin

Community
  • 1
  • 1
Kevin MOLCARD
  • 2,168
  • 3
  • 22
  • 35

2 Answers2

12

Update April 2015: Git Large File Storage (LFS) (by GitHub).

It uses git-lfs (see git-lfs.github.com) and tested with a server supporting it: lfs-test-server:
You can store metadata only in the git repo, and the large file elsewhere.

https://cloud.githubusercontent.com/assets/1319791/7051226/c4570828-ddf4-11e4-87eb-8fc165e5ece4.gif


Original answer (2012)

One solution, for large binary files that don't change much, is to store them in a different referential (like a Nexus repository), and version only a text file which declares which version you need.
Using an "artifact repository" is easier than storing binary elements in a source repo (made for comparing versions and merging between branches, which isn't of much use for said binaries).

The other solution, more git-centric, is git-annex:

git-annex allows managing files with git, without checking the file contents into git.
While that may seem paradoxical, it is useful when dealing with files larger than git can currently easily handle, whether due to limitations in memory, time, or disk space.

It is however not compatible with Windows.

A more generic solution could be git-media, which also allows you to use Git with large media files without storing the media in Git itself.

Finally, the easiest solution is to isolate those binaries in their own git submodule as you mention in your question: it isn't very satisfactory, and the initial clone will still take times, but the next updates for the parent repo will be short.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Thanks for these advises. Nexus seems to be very interesting and it may be useful in several other contexts. I will have a look at this when I will have some time to :). git-annex might be a quicker way to solve my current problem. – Kevin MOLCARD Oct 12 '12 at 09:34
  • git-annex is not compatible with Windows and I need both MacOS and Windows :(. I will uncheck the answer until I have time to try Nexus to see if someone has an other solution. Thanks again. – Kevin MOLCARD Oct 12 '12 at 09:42
  • @KevinMOLCARD no problem. I have added alternatives (see my edited answer) – VonC Oct 12 '12 at 09:56
  • thanks for the update @VanC, I will check git-media. For submodules I don't know because I have a build server which is doing a fresh clone everyday. – Kevin MOLCARD Oct 12 '12 at 10:05
  • 1
    @KevinMOLCARD that fresh clone doesn't have to be recursive (ie doesn't have to include all submodules): it can scan the `.gitmodules` file of the parent repo, and detect if any submodule SHA1 has changes, triggering the fetch only for the right (ie the modified) submodules. – VonC Oct 12 '12 at 12:56
  • Thanks @VonC for the explanation. I will try this as soon as I have a little bit more time. – Kevin MOLCARD Oct 12 '12 at 13:43
  • The answer assumes there are big single files in the repo. But what if there are a large number of small files and a very long history. git-lfs would not help. – Michael S Dec 14 '20 at 13:08
  • @MichaelS True, but this is not *entirely* an assumption: The OP says it right there: "This repository contains lots of binary files (audio and images).". – VonC Dec 14 '20 at 13:32
0

Follow these steps.

1.install git lfs in your local machine by typing in the following code.

git lfs install

2.Now add the file type you want lfs to manage for you.

git lfs track "*.mp4"
  1. Now you are all set. Go ahead and add , commit and push your files and there'll be no warning.
soda
  • 443
  • 6
  • 19