0

Is there a way to have Git list the child commits of a particular commit? That is, if I have the Git branch:

A---B---C---D---E

and I know the commit hash of C, is there a way to obtain D from C?

The bigger issue here is that I have the situation where I've lost a branch because I moved the only branch label pointing to it. So I have something like this:

A---B---C---D (master, moved-branch-label)
     \
      \---E---F---G---H

Say I have the hash of E or F. How do I recover H?

There is a similar existing question to this one. The big difference there is that the OP does not know any of E, F, G or H. The only answer in that case is to use reflog to retrace your steps and manually find the hash for H.

But here, I know where the branch I'm looking for is! I just need to follow the children from the commit I know. I can't believe that there isn't a way to do that in Git. Doesn't that have to be an easy operation for Git to perform? Given E, doesn't it have to know F? It seems that I should be able to use such an operation to find the end of the E-F-G-H branch.

BTW, I was shocked to learn that there is no way to get the Git log entries for nodes E, F, G or H above if you don't supply git log with the hash for one of them. The lack of a label on a branch means that Git ignores that branch. So git log --all will not show those commits. I always figured that git log --all would literally show all commits performed against the repository. But it seems that's not the case. If someone can refute this, or tell me how to force git log to show me those orphaned commits, that would be very helpful.

CryptoFool
  • 21,719
  • 5
  • 26
  • 44
  • Can you find your commits (i.e. `E` or `H`) or your deleted label in the [reflog](https://git-scm.com/docs/git-reflog)? – knittl Feb 23 '21 at 17:43
  • No, I haven't been able to do that. See the update to my question. – CryptoFool Feb 23 '21 at 17:50
  • I mean running `git reflog` or `git reflog your-deleted-branch-name`. No (useful) output? – knittl Feb 23 '21 at 17:53
  • @knittl - Yes to both/either question. For commit hashes for nodes along the branch in question, some of their hashes don't show up in `git reflog` at all, despite the fact that they were committed in the last week. – CryptoFool Feb 23 '21 at 17:56
  • If reflog does not help, [fsck](https://git-scm.com/docs/git-fsck) might be your next best attempt. But inspecting all those dangling commits can be quite time-consuming – knittl Feb 23 '21 at 18:03
  • @Ali's answer is THE answer! Using `--parents` with `git reflog` is the key. I had somehow tried and dismissed that flag as not being helpful, partially because none of the threads I found on the web, including the linked SO question, mentioned it. I figured that if it wasn't mentioned anywhere, it must not be the answer. But it is THE answer! - Thanks all for your help! – CryptoFool Feb 23 '21 at 18:09
  • @knitti - you were on the right track immediately. Thanks for your input. – CryptoFool Feb 23 '21 at 18:13

2 Answers2

4

only searches locally:

git reflog --parents | grep {HASH_OF_COMMIT_F}

only searches your remote repositories:

git reflog --parents --remotes | grep {HASH_OF_COMMIT_F}

searches locally and on remote repositories:

git reflog --parents --all | grep {HASH_OF_COMMIT_F}

These will show you the list of commits that have COMMIT_F as a parent in the format {COMMIT_HASH} {HASH_OF_PARENT_1} {HASH_OF_PARENT_2} . . .. This will get you all direct children of COMMIT_F which should aid you in your search.

Note that the shortened commit hashes are used (i.e. first 7 characters)

Ali Samji
  • 479
  • 2
  • 7
  • This doesn't provide anything useful to me. See the update to my question. – CryptoFool Feb 23 '21 at 17:46
  • That output seems to suggest that `07841ee07` has no children. Are you sure the commit you are looking for is still available locally? Do you get a different response when you add in the `--remotes` or `--all` flag along with `--parents`? – Ali Samji Feb 23 '21 at 18:03
  • YES, I'm wrong!!! Yay! I have been playing with `reflog` a bunch, but I guess I somehow dismissed the fact that `--parents` made any difference. Then, seeing your suggestion, I thought I'd tried it on my specific problem, but I must have done something wrong. I'm trying it again, and it is working! – CryptoFool Feb 23 '21 at 18:04
  • For nodes in the middle of the branch, your suggested command gives me two `commit` references. One is for the commit in question and shows the hash of that commit's parent. But more importantly, you also see the NEXT commit, in which lists the commit in question's hash as its parent. This is great. I don't know how I missed this. I'm surprised that nobody mentioned `--parent` in the similar question I reference. Thanks so much for your response! – CryptoFool Feb 23 '21 at 18:07
  • Note that adding ` | grep commit` to the end of your suggested command filters out the noise, and you get just the two lines of output that describe the ancestry of the commit in question (or just the one line of output for a leaf commit). – CryptoFool Feb 23 '21 at 18:15
1

This is a bit of a side remark (and hence should be a comment, but I need formatting, and also more space—okay, far more space—than there is in a comment):

BTW, I was shocked to learn that there is no way to get the Git log entries for nodes E, F, G or H above if you don't supply git log with the hash for one of them. The lack of a label on a branch means that Git ignores that branch. So git log --all will not show those commits. I always figured that git log --all would literally show all commits performed against the repository.

That would make sense in some other version control systems, but not in Git:

  • --all refers to all references, not all commits;
  • Git finds commits by starting with a given hash ID—perhaps from a reference, or perhaps just a raw hash ID you list on the command line—then working backwards within the commits themselves; and
  • each commit is on zero or more branches. In most repositories, the (singular) root commit is on every branch.

"Discarded" commits, such as E-F-G-H, occur naturally in Git: they're the result of git rebase, for instance, after copying the E-F-G-H chain to some set of new-and-improved commits. For instance perhaps you want the parent of the copy of E to be D rather than B, and to squash the old F+G together, to get:

           E'-FG-H'   <-- somebranch
          /
A--B--C--D   <-- master
    \
     E--F--G--H   ??? [was somebranch, earlier]

The reason—and way—that git reflog works to find these is that each ref has a log of the values it used to hold. So in the example just above, somebranch's reflog will show that at one point, it named commit E; at another—probably just afterward—it named commit F. This will repeat for G and H, and then the rebase operation will, all at once, yank the name somebranch over to commit H'. The E'-FG-H' chain was built by git rebase using detached HEAD mode, so the only reflog that contains these hash IDs is that of HEAD itself, which is also a ref.1

Note that "squash commit" FG itself is built by first making a copy F' of commit F, then shoving that copy aside to build FG, so we could very well draw the above as:

             F'   ???
            /
           E'-FG-H'   <-- somebranch
          /
A--B--C--D   <-- master
    \
     E--F--G--H   ??? [was somebranch, earlier]

In fact, the whole notion of a branch in Git is at best suspect, and at worst, nonsense. Note how in the diagrams above, commit A is on "all branches", including the implied branch formed by working backwards from now-discarded commit H. We can, at any time, create, destroy, and/or move a branch without changing any of the existing commits. The names simply act as labels, pointing into the graph. When a name is a branch name, people call the commits leading up to and including the one pointed-to by that name, "a branch". If we add two names, one to point to F' and one to H, commit A is now on four branches. Without those names, A is on two branches. But what if we do a detached-HEAD checkout of commit C? Is that a branch? If so, A is on it.

Meanwhile, the idea of creating temporary objects, including temporary commits, whenever and wherever it is convenient to do so, pervades Git; not showing all objects is crucial to getting anything done, as there are so many. Git's garbage collector, git gc, removes them after a while, if they're truly unused.

git gc also removes old reflog entries. A reflog entry has a creation time-stamp, and after some time—30 days or 90 days by default, though you can tune both of these—the reflog entry is considered sufficiently stale to be uninteresting, and is removed. Once all mentions of some internal Git object are removed, and several other conditions are met, git gc will remove the object. This is why Git spins off git gc --auto in the background after various Git operations: to clean up leftover junk.

This is where the 30 day grace period for otherwise-discarded commits comes from. The 30 day time limit is a result of the reflogExpireUnreachable setting for some particular reflog. The 90 day period is a result of the reflogExpire setting. Note that both of these settings have, at least potentially, two values per reflog: the time value stored in gc.pattern.reflogExpire overrides the one stored in gc.reflogExpire, when expiring the reflogs for ref name, if the pattern matches the name. The documentation is ... skimpy on what constitutes a pattern here. It also fails to describe properly the difference between the expireUnreachable and expire timeouts:

gc.reflogExpire
gc.<pattern>.reflogExpire
      git reflog expire removes reflog entries older than this time; defaults to 90 days. The value "now" expires all entries immediately, and "never" suppresses expiration altogether. With "<pattern>" (e.g. "refs/stash") in the middle the setting applies only to the refs that match the <pattern>.

gc.reflogExpireUnreachable
gc.<pattern>.reflogExpireUnreachable
      git reflog expire removes reflog entries older than this time and are not reachable from the current tip; defaults to 30 days. The value "now" expires all entries immediately, and "never" suppresses expiration altogether. With "<pattern>" (e.g. "refs/stash") in the middle, the setting applies only to the refs that match the <pattern>.

The not reachable from the current tip phrase means that Git inspects the actual value stored in the ref at the moment. If that identifies a commit that leads back to the commit whose hash ID is stored in the reflog entry, Git chooses the expire time. If it identifies a commit that does not lead back to the commit in the reflog entry, Git chooses instead the expireUnreachable time. As phrased, it sounds like git gc looks at both times for such entries, but in fact git gc just assumes that the "unreachable" grace period will be less than or equal to that for reachable commits.

As all of this implies, reachability is a central concept in Git. It's not properly taught in far too many Git introductions. For a good explainer, see Think Like (a) Git.

(I'm not sure how the <pattern>s work myself. Without poking around in the Git source or experimenting, my guess would be that Git uses glob-style matching here, but even if so, we should wonder: are there any implied * or ** globs at one or both ends? That is, is refs/stash really **/refs/stash/**, or is it anchored at the refs and/or stash end? I have never tried to tune my git gc-invoked reflog expirations: the defaults have been fine.)


1Since a ref is defined as *something that starts with refs/, HEAD can't quite be a ref. But it still has a reflog, which implies that it's a ref. We can compare this to pseudorefs like ORIG_HEAD, CHERRY_PICK_HEAD, MERGE_HEAD, and so on, which don't get reflogs. The Git documentation is a bit soft in the HEAD, er, fuzzy about whether HEAD counts as a ref, here.

In fact, though, HEAD—written in all capitals like this—is extra-special. There's a symbolic way to refer to it, using the character @, that might help emphasize its special-ness. The use of @ for HEAD first appeared in Git 1.8.5, though, and various glitches were fixed over time. The specialness is reflected in additional ways: for instance, HEAD is never packed, and if the file holding it disappears, Git stops thinking that the repository is a repository: the existence of the file is one of three criteria in the internal "is this a Git repository" test. In addition, HEAD is now a per-worktree ref, but this is also true of, e.g., the bisect refs. The entire notion of a per-worktree ref was new in Git 2.5, due to the addition of git worktree. Some things were corrected somewhat in Git 2.7, and a couple of nasty per-worktree items affecting git gc were not fixed until Git 2.14 and 2.15. For this reason I recommend care around git worktree add if your Git is not at least 2.15.

Note that branches, tags, remote-tracking names, and so on are all subsets of the general form. A ref whose name starts with refs/heads/ is a branch name.

torek
  • 448,244
  • 59
  • 642
  • 775