10

I'm using Git to version a series of binary files. They compress pretty well, but my central repos do not seem to be compressing when I push to them. They're eating up a decent amount of my quota, so I was looking to see if there was a way to force the remote repo to do a GC.

Is this possible? I'm working on Project Locker so I don't believe I have SSH access to go in and GC the repo myself. Any ideas? Thanks.

jocull
  • 20,008
  • 22
  • 105
  • 149

3 Answers3

8

If you can't run git gc yourself, you're going to have to trick it into running automatically. You won't have quite such full control over it then, but you should at least be able to get it to run.

git gc --auto is run by several commands; the relevant one here is receive-pack, which is run on the remote to receive a pack as part of a push. gc --auto only repacks when there are enough loose objects; the cutoff is determined by the config parameter gc.auto, and defaults to 6700.

If you have access to the remote's gitconfig, you could set that cutoff to 1 temporarily. There should pretty definitely be at least 1 loose object in the repo, so that should cause gc --auto to do its thing the next time you push.

If you don't have access to the remote's gitconfig, all I can think to do is artificially create a bunch of loose objects. You could do that by creating a branch, committing a bunch of tiny files (with different content) to it, pushing the branch to the remote, then deleting the branch from the remote. (Important to vary the content, or they'll just use the same blobs.) Rinse and repeat.

Cascabel
  • 479,068
  • 72
  • 370
  • 318
  • I may try making a small app that pumps out 7000 or so text files. I don't believe I have access to the config. I'll let you know how it goes, thanks! – jocull Nov 20 '10 at 06:04
  • What is that, a shell script? Anyways, I just made something real quick in Qt and push up all the loose objects. It didn't seem to make a difference in my space usage. I will just have to try to contact ProjectLocker and see what is up. Thanks for everyone's help. – jocull Nov 20 '10 at 17:05
  • @jocull: I don't remember exactly how the objects get unpacked on the remote end. I would personally probably try going way overboard with it just in case. And yes, that's a shell one-liner... except I apparently forgot half of it. Last part should read `while read n; do echo $n > $n.txt; done` – Cascabel Nov 20 '10 at 18:32
  • I decided to delete my repo on Project Locker, then recreate it with the same name. I ran 'git gc' locally and then pushed the whole repo. It's now using half the space is was previously. – jocull Nov 21 '10 at 16:09
  • @jocull: Well, that's one way to run `gc`! Seems odd to me to be able to totally obliterate a repo, but not to be able to edit its config, but sure works! – Cascabel Nov 21 '10 at 16:33
  • I guess that's the beauty of distributed source control :) – jocull Nov 22 '10 at 16:17
2

That's really a problem that they need to solve on their end. They can do it with a post-receive hook or a cron job or something similar, but if they're supposed to be maintaining your repositories, that's kind of part of it for numerous reasons.

Dustin
  • 89,080
  • 21
  • 111
  • 133
  • 1
    I tend to agree, though sometimes you can't wait for other people to solve their problems. It could also be that they do run `gc` a decent amount, but the binary files are causing things to bloat up faster than usual with text content. – Cascabel Nov 18 '10 at 07:19
  • If they don't GC then they sell me more space. It almost makes sense for them not to from a business POV. – jocull Nov 18 '10 at 18:18
1

Form git-gc man page:

Some git commands may automatically run git gc; see the --auto flag below for details.

And further:

--auto

With this option, git gc checks whether any housekeeping is required; if not, it exits without performing any work. Some git commands run git gc --auto after performing operations that could create many loose objects.

Housekeeping is required if there are too many loose objects or too many packs in the repository. If the number of loose objects exceeds the value of the gc.auto configuration variable, then all loose objects are combined into a single pack using git repack -d -l. Setting the value of gc.auto to 0 disables automatic packing of loose objects.

If the number of packs exceeds the value of gc.autopacklimit, then existing packs (except those marked with a .keep file) are consolidated into a single pack by using the -A option of git repack. Setting gc.autopacklimit to 0 disables automatic consolidation of packs.

And in the end:

The git gc --auto command will run the pre-auto-gc hook. See githooks(5) for more information.

Aleksej Komarov
  • 194
  • 1
  • 8
takeshin
  • 49,108
  • 32
  • 120
  • 164