49

What is the best way for git to consume less disk space?

I'm using git-gc on my repositories (which does help, especially if there have been many commits since it was cloned) but I would like suggestions if there is any other command to shrink the disk space used by git.

Jeromy French
  • 11,812
  • 19
  • 76
  • 129
Joao Trindade
  • 903
  • 2
  • 8
  • 9
  • 1
    Either you want version control with full history or you don't. A full history will inevitably take up some space. – innaM Sep 09 '09 at 13:07
  • And besides: What is cheaper than storage these days? – innaM Sep 09 '09 at 13:08
  • 1
    How much space are we talking about, anyways ? – Rook Sep 27 '09 at 17:08
  • 7
    @innaM It isn't just about drive space... imagine you have to clone a repo that's hundreds of MB over the internet... that can take time. – Jeremy Logan Dec 19 '12 at 21:56
  • 4
    My cellphone (jolla/sailfishOS) uses git for keeping track of changes to my address book, gallery, messages and misc. The data itself is just some few megabytes, but the git repo is now 2.2G, which is significant on that device. I don't need to carry the full history with me on my cellphone, it is not trivial to "add cheap storage", and I do want to continue using git for backup purposes, I do want to have an offsite backup of the full history ... I just don't want to carry the full history with me all the time. – tobixen Jun 24 '16 at 11:53
  • 1
    @innaM, I'm going to ask the really obvious question - what if you want to use git but don't want the full history? Is full history the ONLY thing git has to offer? As far as storage, sometimes increasing it isn't an option in a work environment. – Chance Jul 24 '17 at 16:21

11 Answers11

50

There are a few suggestions I can offer:

  1. Delete no longer used branches. They can pin some commits that you don't use and would never use. Take care however to not delete branches that you would later need (perhaps for review, or for comparison of failed effort). Backup first.

  2. Check if you didn't commit some large binary file (perhaps some generated file) by mistake. If you have, you can purge it from history using "git filter-branch"... well, if you didn't share the repository, or it is worth aggravating other contributors to rewrite history. Again: backup first.

  3. You can prune more aggressively, discarding some safeties, bu using git gc --prune=now, or low-level git prune. But take care that you don't remove safeties and backups (like reflog) that you need minute after compacting.

  4. Perhaps what enlarges your repository are some untracked files in working directory. There "make clean" or "git clean" might help (but take care that you don't remove some important files).

  5. Most safe of all those suggestions: you can try to pack more aggressively, using --depth and --window option of low-level git-repack. See also Git Repack Parameters blog post by Pieter de Bie on his DVCS Comparison blog, from June 6, 2008. Or "git gc --aggressive".

SilentGhost
  • 307,395
  • 66
  • 306
  • 293
Jakub Narębski
  • 309,089
  • 65
  • 217
  • 230
  • 1
    This is not enough; there are more places from which references can come that prevent git gc from collecting stuff. See my post http://antilamer.livejournal.com/443564.html – jkff Sep 08 '12 at 02:48
  • 1
    as an addendum to point 2, the [Git BFG Repo Cleaner](https://rtyley.github.io/bfg-repo-cleaner/) is a much faster alternative to `git filter-branch`. – medavox Aug 12 '20 at 14:35
23

Depending on what you want to do with your repository, you might also consider using the following git clone option:

   --depth <depth>
       Create a shallow clone with a history truncated to the specified
       number of revisions. A shallow repository has a number of
       limitations (you cannot clone or fetch from it, nor push from nor
       into it), but is adequate if you are only interested in the recent
       history of a large project with a long history, and would want to
       send in fixes as patches.
adl
  • 15,627
  • 6
  • 51
  • 65
  • 2
    Looks like after git 1.9 shallow clones do allow a bit more interaction. a BETTER solution is described in this SO-answer using replace http://stackoverflow.com/a/17622991/25286 – GDmac Dec 07 '16 at 06:11
14

git-gc calls lots of other commands that are used to clean up and compress the repository. All you could do is delete some old unused branches.

Short answer: No :-(

Tilka
  • 216
  • 2
  • 6
7

Git clone now has a --single-branch option that allows you to checkout a single branch without pulling in the git history of the other branches. If git is consuming a lot of disk space because you have a lot of branches, you can delete your current checkout and re-clone the repo using this option to regain some disk space. For example:

cd ../
rm -rf ./project
git clone -b master --single-branch git@github.com:username/project.git

Also, if your current master has a long history and you don't have any outstanding branches that need to be merged back into master, you can create an archive branch off of master and create a new orphan master with no git history:

git checkout -b master_archive_07162013  # create and switch to the archive branch
git push origin master_archive_07162013  # push the archive branch to the remote and track it
git branch -D master                     # delete local master
git push --delete origin master          # delete remote master
git remote prune origin                  # delete the remote tracking branch
git checkout --orphan master             # create a new master branch with no history
git commit -m "initial commit"           # re-establish the files in the repo
git push origin master                   # push the new master to the remote

The new master branch's tree will not be related to the old archived master branch, so only do this when you are truly archiving the branch.

If you archive your master branch and then git clone master with single-branch, your checkout should be a lot smaller.

curmil
  • 1,077
  • 13
  • 9
5

Every git repository contains the entire history. While git does a fairly good job of compressing this stuff, there's simply a lot of data in there.

The "obvious" but potentially not-possible-for-you solution is to start a new repository without all that old history.

Artelius
  • 48,337
  • 13
  • 89
  • 105
5

If you do not need to keep all of the commit history locally, you could use a shallow clone:

git clone --depth=1 [url_of_repo]

I frequently use this when cloning github projects, if I am only interested in the latest set of files and not in the history.

Apparently fetching and pushing is/was not support on shallow clones, but I have been able to successfully push and pull changes to github repos with it, so it might work in your case too. (But no doubt you will run into difficulties if you want to merge branches but don't have the base commit in history.)

I think it is easier to start with a fresh clone as shown above, but others have shown how to trim an existing local repo.

Community
  • 1
  • 1
joeytwiddle
  • 29,306
  • 13
  • 121
  • 110
  • ... or wait ... "you cannot clone or fetch from it, nor push from nor into it" ... that's a bummer – tobixen Jun 24 '16 at 12:00
4

Git gc will remove unused objects. That is about everything you can do.

You could consider splitting up your repositories if they become too big.

Sardaukar
  • 29,034
  • 5
  • 26
  • 32
2

You can repack your repository. However i think it's called by git gc

git repack -ad

Amandasaurus
  • 58,203
  • 71
  • 188
  • 248
1

git prune might be a hint. it cleans the repository from unreachable commits (git gc does not call it)

knittl
  • 246,190
  • 53
  • 318
  • 364
  • 3
    From the git-prune manpage: "In most cases, users should run git-gc, which calls git-prune." – mipadi Sep 09 '09 at 15:31
  • well, for me my git repositories get smaller by a few mb when calling `git prune` after `git gc` (measured with du -sh .git). maybe `git gc` only prunes older commits, and `git prune` prunes every object which is not reachable – knittl Sep 09 '09 at 15:41
  • IIRC `git gc` offers some extra security (not deleting some objects) that `git prune` lacks. – Jakub Narębski Sep 09 '09 at 16:48
  • Before 'git prune' ~900 Mb. After 'git prune' ~150 Mb. – mykolaj Jun 10 '16 at 17:42
0

Fool proof method if you don't care about download size is to delete the repository (you can just delete the whole folder) and add it again. Make sure everything that needs to be preserved is pushed to the server!

CodingYourLife
  • 7,172
  • 5
  • 55
  • 69
-1

You might have a lot of git projects cloned on your computer, but only a few of them you are actively working on today.

In those idle projects, the checked out working files can consume a significant amount of disk space. (Sometimes even larger than git's history, because history gets compressed.)

  • So one way to save disk space is to remove the working files from idle projects you are not working on. A nice way to do that is to create an empty branch which you can switch to when you are not working on the project.

  • Another more drastic thing you can do is to delete absolutely everything except for the .git/config file. That will allow you to git clone the project again in future.

    Before doing this, you should ensure that you have committed and pushed all your work (including all local branches) to the remote repository, so there is nothing in the local git repo that you need to retain.

joeytwiddle
  • 29,306
  • 13
  • 121
  • 110