94

How can I convert an already cloned git repository to a shallow repository?

The git repository is downloaded through a script outside of my control so I cannot do a shallow clone.

The reason for doing this is to save disk space. (Yes, I'm really short on disk space so even though a shallow repository doesn't save much, it is needed.)

I already tried

git repack -a -d -f -depth=1

But that actually made the repository larger.

Philipp
  • 48,066
  • 12
  • 84
  • 109
Robert
  • 6,855
  • 4
  • 35
  • 43
  • http://stackoverflow.com/questions/1398919/make-git-consume-less-disk-space/1400849#1400849 could help. What gives a `git gc` after your repack? – VonC Jan 15 '11 at 09:20
  • huitseeker: Thanks for bringing it up. I am aware of the limitations and I am okay with it. I need access to the latest commit, or ideally couple of commits, but that's it. – Robert Jan 15 '11 at 11:44
  • VonC: I'm doing a gc --aggressive right now. I should gain some from it, but if possible I would also like to drop objects I don't need. – Robert Jan 15 '11 at 11:45
  • I just came across http://progit.org/2010/03/17/replace.html which suggests an alternate, potentially simpler, process involving `git commit-tree`. – Tyler Jan 20 '11 at 06:06
  • 1
    The --depth parameter in git repack is unrelated to shallowing: it is the depth in the deltification algorithm: --depth=1 means we want a deltification of 1, which is smaller than the default of 50, so there is less compression. – Seb35 Sep 24 '21 at 14:33
  • i made a [git-shallow-maker](https://github.com/milahu/random/blob/master/git/git-shallow-maker) to copy all local branches to a new local repo. this will copy only the needed commits, so the new repo is shallow – milahu Feb 12 '23 at 13:59

6 Answers6

86

First, you may need to remove tags (as they prevent GC of tagged commits), like:

git tag -d $(git tag -l)

Then, this worked for me:

git pull --depth 1
git gc --prune=all

Which still leaves the reflog laying around, which like the tags references additional commits that can use up space. Note that I would not erase the reflog unless severely needed: it contains local change history used for recovery from mistakes.

There are additional commands on how to erase the reflog in the comments below, and a link to a similar question with a longer answer.

If you still have a lot of space used, ensure you removed the tags, which you should try first before removing the reflog.

fuzzyTew
  • 3,511
  • 29
  • 24
  • 1
    Hmm for me this gives "fatal: git fetch-pack: expected shallow list" on the pull. – Ben Farmer Jan 16 '17 at 10:15
  • 1
    @BenFarmer Well that's no good! As shallow support has been slowly developing, this probably only works on recent versions of git. What version do you have? – fuzzyTew Jan 17 '17 at 01:28
  • 1
    hmm, seems to be 2.7.0.rc3. I'll see if a newer one is available in my repos and try that... – Ben Farmer Jan 18 '17 at 09:02
  • 15
    I've run the above commands. The repo is indeed shallow now (`git log` shows only one commit and `git branch` shows just one branch). But the `.git` folder still occupies 2.5 GB. The same repo cloned with `--depth 1` occupies about 1 GB. Any advice how to cut down the disk usage? – Dzmitry Apr 05 '17 at 08:57
  • 1
    @Dzmitry you're right. See the answer I posted, I think it saves space. – VasiliNovikov Sep 24 '17 at 11:35
  • 1
    @Dzmitry, I've updated the commands in the answer to add `--prune=all` to garbage collection. This immediately deletes the extra objects for me. – fuzzyTew Feb 07 '18 at 03:20
  • 1
    got a couple downvotes - please either edit or comment what could change – fuzzyTew Jan 07 '19 at 17:20
  • 15
    From my experiments `git pull --depth=1` keeps non-HEAD tags, which are not removed by `git gc --prune=all`. I had to use `git tag -d $(git tag -l)` to properly garbage collect those refs. – v1bri Jun 21 '19 at 18:30
  • 8
    I found the above commands were not enough, and that I had to do this as well: `git reflog expire --expire=all --all` as recommended in https://stackoverflow.com/questions/38171899/how-to-reduce-the-depth-of-an-existing-git-clone Also, the git tag command above is also needed too. – Wayne Piekarski Sep 11 '19 at 01:31
  • having used the reflog a lot, i'm skeptical to recommend clearing it without letting people know that it contains their action history for recovery. the direct link to the other answer is https://stackoverflow.com/a/46004595/129550 – fuzzyTew Aug 16 '20 at 15:51
  • 1
    Even combining all the commands listed here, pull depth 1, deleting the tags, killing the reflog, and then finally doing the gc, I did not get space savings in my .git directory. – Nir Friedman Feb 26 '21 at 16:46
  • That's too bad. If you're on a unix terminal you can use a command like `du -h --max-depth=2 .git` to make sure it is object or pack files using the space up, and not something else. – fuzzyTew Feb 27 '21 at 10:01
  • `--prune=all` should be `--prune=now`, see: https://www.spinics.net/lists/git/msg354409.html – Paul Aug 25 '23 at 21:54
17

You can convert git repo to a shallow one in place along this lines:

git show-ref -s HEAD > .git/shallow
git reflog expire --expire=0
git prune
git prune-packed

Make sure to make backup since this is destructive operation, also keep in mind that cloning nor fetching from shallow repo is not supported! To really remove all the history you also need to remove all references to previous commits before pruning.

user212328
  • 613
  • 6
  • 7
  • 6
    Actually this doesn't seem to do anything. – hendry May 01 '12 at 05:04
  • hendry: Most likely you have not removed other references pointing to HEAD's history. Try removing all other branches and tags before attempting this steps. – user212328 Jun 03 '12 at 11:47
  • For submodules, you might need to resolve the `.git` file to the git dir (`git rev-parse --git-dir`). Also, you could use `git describe --always HEAD~5` instead of `show-ref -s HEAD` to keep the latest commits. Then there is also `git fetch --unshallow` in the meantime to unshallow a clone. – blueyed Mar 27 '14 at 09:05
  • 2
    In order to remove all references, add --all to the reflog command: git reflog expire --expire=now --all – Jiyong Park Mar 01 '16 at 02:11
  • 4
    Note that `git prune` performs `git prune-packed` already. Also note that if you want all branches stored, they must all have their tips listed in `.git/shallow`. This command worked for me, but I don't know if it will work all the time: `find .git/refs -type f | xargs cat | sort -u > .git/shallow` – fuzzyTew May 31 '16 at 01:19
  • `git describe --always` output gives me a `bad shallow line` error – fuzzyTew May 31 '16 at 02:51
  • You might need to remove unneeded refs from **`.git/packed-refs`** before `git prune`. – ryenus Nov 17 '17 at 04:19
  • This needs `git gc --prune=all` as well. – bukzor Aug 13 '22 at 20:16
13

Convert to shallow since a specific date:

git pull --shallow-since=YYYY-mm-dd
git gc --prune=all

Also works:

git fetch --shallow-since=YYYY-mm-dd
git gc --prune=all
Pop Catalin
  • 61,751
  • 23
  • 87
  • 115
  • 1
    Thanks! For me `git fetch --depth 1; git gc --aggressive --prune=all` worked as well. Doing so seemed to be equivalent of doing a shallow clone with: `git clone --depth 1` – Night Train Mar 09 '23 at 18:21
  • I get `you are not currently on a branch. Please specify which branch you want to rebase against. ` – rubo77 Apr 28 '23 at 08:48
11

Create shallow clone of a local repo:

git clone --depth 1 file:///full/path/to/original/dir destination

Note that the first "address" should be a file://, that's important. Also, git will assume your original local file:// address to be the "remote" ("origin"), so you'll need to update the new repository specifying the correct git remote.

VasiliNovikov
  • 9,681
  • 4
  • 44
  • 62
  • This did the trick for me. In our CI setup, I wanted to clone out the full repo in order to apply patches from another branch, and then shrink the directory as much as possible since it would be TAR'ed and stored. – gablin Jun 12 '19 at 12:19
9

Combining the answer from @fuzzyTew with what the comments on that answer:

git pull --depth 1
git tag -d $(git tag -l)
git reflog expire --expire=all --all
git gc --prune=all

Want to save space by running this across your entire disk? - Then run this fd command:

fd -HIFt d '.git' -x bash -c 'pushd "$0" && ( git pull --depth 1; git tag -d $(git tag -l); git reflog expire --expire=all --all; git gc --prune=all ) && popd' {//}

Or with just regular find:

find -type d -name '.git' -exec bash -c 'pushd "${0%/*}" && ( git pull --depth 1; git tag -d $(git tag -l); git reflog expire --expire=all --all; git gc --prune=all ) && popd' {} \;
Samuel Marks
  • 1,611
  • 1
  • 20
  • 25
2

Note that a shallow repo (like one with git clone --depth 1 as a way to convert an existing repo to a shallow one) can fail on git repack.

See commit 5dcfbf5, commit 2588f6e, commit 328a435 (24 Oct 2018) by Johannes Schindelin (dscho).
(Merged by Junio C Hamano -- gitster -- in commit ea100b6, 06 Nov 2018)

repack -ad: prune the list of shallow commits

git repack can drop unreachable commits without further warning, making the corresponding entries in .git/shallow invalid, which causes serious problems when deepening the branches.

One scenario where unreachable commits are dropped by git repack is when a git fetch --prune (or even a git fetch when a ref was force-pushed in the meantime) can make a commit unreachable that was reachable before.

Therefore it is not safe to assume that a git repack -adlf will keep unreachable commits alone (under the assumption that they had not been packed in the first place, which is an assumption at least some of Git's code seems to make).

This is particularly important to keep in mind when looking at the .git/shallow file: if any commits listed in that file become unreachable, it is not a problem, but if they go missing, it is a problem.
One symptom of this problem is that a deepening fetch may now fail with:

fatal: error in object: unshallow <commit-hash>

To avoid this problem, let's prune the shallow list in git repack when the -d option is passed, unless -A is passed, too (which would force the now-unreachable objects to be turned into loose objects instead of being deleted).
Additionally, we also need to take --keep-reachable and --unpack-unreachable=<date> into account.

Note: an alternative solution discussed during the review of this patch was to teach git fetch to simply ignore entries in .git/shallow if the corresponding commits do not exist locally.
A quick test, however, revealed that the .git/shallow file is written during a shallow clone, in which case the commits do not exist, either, but the "shallow" line does need to be sent.
Therefore, this approach would be a lot more finicky than the approach presented by the this patch.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250