This is a bit of a side remark (and hence should be a comment, but I need formatting, and also more space—okay, far more space—than there is in a comment):
BTW, I was shocked to learn that there is no way to get the Git log entries for nodes E, F, G or H above if you don't supply git log
with the hash for one of them. The lack of a label on a branch means that Git ignores that branch. So git log --all
will not show those commits. I always figured that git log --all
would literally show all commits performed against the repository.
That would make sense in some other version control systems, but not in Git:
--all
refers to all references, not all commits;
- Git finds commits by starting with a given hash ID—perhaps from a reference, or perhaps just a raw hash ID you list on the command line—then working backwards within the commits themselves; and
- each commit is on zero or more branches. In most repositories, the (singular) root commit is on every branch.
"Discarded" commits, such as E-F-G-H
, occur naturally in Git: they're the result of git rebase
, for instance, after copying the E-F-G-H
chain to some set of new-and-improved commits. For instance perhaps you want the parent of the copy of E
to be D
rather than B
, and to squash the old F
+G
together, to get:
E'-FG-H' <-- somebranch
/
A--B--C--D <-- master
\
E--F--G--H ??? [was somebranch, earlier]
The reason—and way—that git reflog
works to find these is that each ref has a log of the values it used to hold. So in the example just above, somebranch
's reflog will show that at one point, it named commit E
; at another—probably just afterward—it named commit F
. This will repeat for G
and H
, and then the rebase operation will, all at once, yank the name somebranch
over to commit H'
. The E'-FG-H'
chain was built by git rebase
using detached HEAD mode, so the only reflog that contains these hash IDs is that of HEAD
itself, which is also a ref.1
Note that "squash commit" FG
itself is built by first making a copy F'
of commit F
, then shoving that copy aside to build FG
, so we could very well draw the above as:
F' ???
/
E'-FG-H' <-- somebranch
/
A--B--C--D <-- master
\
E--F--G--H ??? [was somebranch, earlier]
In fact, the whole notion of a branch in Git is at best suspect, and at worst, nonsense. Note how in the diagrams above, commit A
is on "all branches", including the implied branch formed by working backwards from now-discarded commit H
. We can, at any time, create, destroy, and/or move a branch without changing any of the existing commits. The names simply act as labels, pointing into the graph. When a name is a branch name, people call the commits leading up to and including the one pointed-to by that name, "a branch". If we add two names, one to point to F'
and one to H
, commit A
is now on four branches. Without those names, A
is on two branches. But what if we do a detached-HEAD checkout of commit C
? Is that a branch? If so, A
is on it.
Meanwhile, the idea of creating temporary objects, including temporary commits, whenever and wherever it is convenient to do so, pervades Git; not showing all objects is crucial to getting anything done, as there are so many. Git's garbage collector, git gc
, removes them after a while, if they're truly unused.
git gc
also removes old reflog entries. A reflog entry has a creation time-stamp, and after some time—30 days or 90 days by default, though you can tune both of these—the reflog entry is considered sufficiently stale to be uninteresting, and is removed. Once all mentions of some internal Git object are removed, and several other conditions are met, git gc
will remove the object. This is why Git spins off git gc --auto
in the background after various Git operations: to clean up leftover junk.
This is where the 30 day grace period for otherwise-discarded commits comes from. The 30 day time limit is a result of the reflogExpireUnreachable
setting for some particular reflog. The 90 day period is a result of the reflogExpire
setting. Note that both of these settings have, at least potentially, two values per reflog: the time value stored in gc.pattern.reflogExpire
overrides the one stored in gc.reflogExpire
, when expiring the reflogs for ref name
, if the pattern
matches the name
. The documentation is ... skimpy on what constitutes a pattern
here. It also fails to describe properly the difference between the expireUnreachable
and expire
timeouts:
gc.reflogExpire
gc.<pattern>.reflogExpire
git reflog expire removes reflog entries older than this time;
defaults to 90 days. The value "now" expires all entries
immediately, and "never" suppresses expiration altogether. With
"<pattern>" (e.g. "refs/stash") in the middle the setting applies
only to the refs that match the <pattern>.
gc.reflogExpireUnreachable
gc.<pattern>.reflogExpireUnreachable
git reflog expire removes reflog entries older than this time and
are not reachable from the current tip; defaults to 30 days. The
value "now" expires all entries immediately, and "never" suppresses
expiration altogether. With "<pattern>" (e.g. "refs/stash") in the
middle, the setting applies only to the refs that match the
<pattern>.
The not reachable from the current tip phrase means that Git inspects the actual value stored in the ref at the moment. If that identifies a commit that leads back to the commit whose hash ID is stored in the reflog entry, Git chooses the expire
time. If it identifies a commit that does not lead back to the commit in the reflog entry, Git chooses instead the expireUnreachable
time. As phrased, it sounds like git gc
looks at both times for such entries, but in fact git gc
just assumes that the "unreachable" grace period will be less than or equal to that for reachable commits.
As all of this implies, reachability is a central concept in Git. It's not properly taught in far too many Git introductions. For a good explainer, see Think Like (a) Git.
(I'm not sure how the <pattern>
s work myself. Without poking around in the Git source or experimenting, my guess would be that Git uses glob-style matching here, but even if so, we should wonder: are there any implied *
or **
globs at one or both ends? That is, is refs/stash
really **/refs/stash/**
, or is it anchored at the refs
and/or stash
end? I have never tried to tune my git gc
-invoked reflog expirations: the defaults have been fine.)
1Since a ref is defined as *something that starts with refs/
, HEAD
can't quite be a ref. But it still has a reflog, which implies that it's a ref. We can compare this to pseudorefs like ORIG_HEAD
, CHERRY_PICK_HEAD
, MERGE_HEAD
, and so on, which don't get reflogs. The Git documentation is a bit soft in the HEAD
, er, fuzzy about whether HEAD
counts as a ref, here.
In fact, though, HEAD
—written in all capitals like this—is extra-special. There's a symbolic way to refer to it, using the character @
, that might help emphasize its special-ness. The use of @
for HEAD
first appeared in Git 1.8.5, though, and various glitches were fixed over time. The specialness is reflected in additional ways: for instance, HEAD
is never packed, and if the file holding it disappears, Git stops thinking that the repository is a repository: the existence of the file is one of three criteria in the internal "is this a Git repository" test. In addition, HEAD
is now a per-worktree ref, but this is also true of, e.g., the bisect refs. The entire notion of a per-worktree ref was new in Git 2.5, due to the addition of git worktree
. Some things were corrected somewhat in Git 2.7, and a couple of nasty per-worktree items affecting git gc
were not fixed until Git 2.14 and 2.15. For this reason I recommend care around git worktree add
if your Git is not at least 2.15.
Note that branches, tags, remote-tracking names, and so on are all subsets of the general form. A ref whose name starts with refs/heads/
is a branch name.