What is the cause of loose objects? Can they safely be removed by `git prune`?

Question

I'm using a local Git repo that uses our company's SVN repo as origin.

I often receive the message:

error: The last gc run reported the following. Please correct the root cause and remove .git/gc.log. Automatic cleanup will not be performed until the file is removed.

warning: There are too many unreachable loose objects; run 'git prune' to remove them.

Indeed removing .git/gc.loc and calling git prune fixes the issue.
However, in a comment to VonC's answer to How to skip "Loose Object" popup when running 'git gui' by Michael Donohue, Michael Donohue states:

[...] I do like the safety aspect of keeping the loose objects around for two weeks, should I want to go back and look at some old revisions [...]

In an answer (also by VonC) to Whole team gets 'too many unreachable loose objects' messages — a question aboutgitlabissue with loose objects after moving from SVN to git — by jlengrand, VonC writes:

ran git prune and prayed it didn't break things (which it thankfully didn't)

So, I assume git prune is a dangerous operation that can destroy things.

To safely deal with the "too many unreachable loose objects"-message I have the following questions:

What causes these loose objects (see man git-fsck about unreachable object and see torek's answer — about the inner workings of git objects and hashes and git gc, git prune, and git repack — to What does git do when we do : git gc - git prune by Lyes CHIOUKH)?
Is it only git svn push that:

reads the git commit object,
sends it to the SVN server,
retrieves the stored SVN revision,
creates a fresh git commit object that reflects the SVN revision,
replaces the HEAD pointer to the commit from step 1 with one pointing to the commit of step 4, and
leaves the original commit as a loose object.

Does this indeed cause the loose commits?
Does this cause all loose commits (I also do some git tree manipulation of stuff not yet in SVN such as git stash, git cherry-pick, git rebase, and git reset)?

When may I need these loose objects? What is a good policy for using git prune on my personal git repo?

torek · Answer 1 · 2018-08-30T17:23:57.280

3

In Git, all new objects start out as "loose" objects. (It's not at all clear to me why you are getting the too many error.)

There is a plumbing command, git hash-object, that can create a new object of any type. Other commands essentially build in git hash-object for the object type(s) they need, e.g., git write-tree creates some number of tree objects using the index, and git commit-tree creates one commit object. Using git add creates a loose object for each added file; obviously in some cases this could be a lot of objects.

It's therefore pretty easy to create a large number of loose objects, but as a rule, Git commands that do that, also run git gc --auto to pack them automatically. Once they are safely packed away, git prune removes the loose ones:

So, I assume git prune is a dangerous operation that can destroy things.

It's not particularly dangerous, as long as you run it while not doing anything else (i.e., not running any Git command that might be actively creating loose objects). Adding --expire 14.days.ago makes it preserve recently-created loose objects, and that's what git gc does—well, it uses your gc.pruneExpire setting, but that defaults to 14.days.ago.

If you're actively working on Git itself, and introduce a bug into the packing programs, then it becomes dangerous. :-) Basically, any time you remove redundancy, you need to be sure it's actually redundant.

edited Aug 30 '18 at 17:23

answered Aug 30 '18 at 17:16

torek

448,244
59
642
775

the error message may be more about the unreachable part of the objects than whether they are loose or packed. But you know probably more about git than I do. Is the question missing information that would help determine the cause of the error message? – Kasper van den Berg Aug 31 '18 at 07:21
I'm not sure what else you *can* mention. I have seen this kind of thing once before (in a different StackOverflow question), and it was not clear what generated all the loose objects, but cranking down the default expiry from 2 weeks / 14 days to something shorter seems to have tamed the problem for them. Essentially `git gc` is *too* conservative: if there are "too many" (for some value of "too many") loose objects, it stops auto-collecting them, which produces ever more loose objects over time until you notice your repository ballooning in size. – torek Aug 31 '18 at 15:45
2

The GitLab question you linked-to seems to be a result of some unfortunate design decision made in GitLab, to use raw hash IDs where Git does not see them and then get startled by the fact that Git will GC the (eventually) unreferenced (to Git) objects. So at least some versions of GitLab have auto-gc completely disabled (!). If you're not using GitLab, that won't have anything to do with the issue—the problem is that once *something* makes "too many" loose objects, no auto-gc will ever fix it. – torek Aug 31 '18 at 15:49

What is the cause of loose objects? Can they safely be removed by `git prune`?

1 Answers1