Find latest commits to a file across all local branches with git - compare across local branches

Question

So I'm working on a bunch of branches. And I remember, that on one of these branches, I did some super-smart change to a file. But I can't remember which branch this happened in.

Are there a git-command, that can show me all the latest changes to a file, across all the branches I have locally, for a repository?

Example

    /---D
   / 
  /   /---E
 /   / 
A - B - C
 \
  \
   \--F

I'm sitting on C and I know that I've made a super-clever commit in either D, E or F, to a specific file.

I could go through them one by one, to see the contents of the file. But I was hoping for a command like this:

$ git magic-command-1 "path/to/target/file"

Commit 123456 on branch "F" at 2022-05-19 15:10
Commit 234567 on branch "E" at 2022-05-19 14:33
Commit 345678 on branch "D" at 2022-05-19 11:12

and maybe also something that shows the differences.

I tried this:

git log -p -- cypress.development.json

But I'm not sure if it show across all branches or not. Or which branches the given changes show.

I also read here about something about an --all-flag, but the output doesn't show which branch the change is made on:

I also looked at the --source-flag, but the results doesn't really make any sense to me.

Regardless of what I do, I feel like I'm missing a command to appropriately compare the same file across all local branches.

score 1 · Accepted Answer · answered May 19 '22 at 19:55

TL;DR

Use git branch --contains with the hash IDs you find. But: why do you care? The hash ID is all you really need.

Long

There's a basic problem here with your diagram: it has no branch names on it. Let's put some branch names on it and then ask a key question:

    /---D   <-- br1
   / 
  /   /---E   <-- br2
 /   / 
A - B - C   <-- br3
 \
  \
   \--F   <-- br4

Which branch is commit A on?

Warning: this is a trick question! The answer is below, with (I hope) enough text in between so that you can't just cheat and read it, and will instead have to think about this. The obvious answer is "it's on br3" but this isn't right. (It's not wrong, it's just not right.)

What you will want to do

I also read here about something about an --all-flag ...

Use this flag, then use git describe or git branch --contains with the found commit hash IDs, or:

I also looked at the --source-flag, but the results doesn't really make any sense to me.

The --source flag does what the git log documentation says it does:

--source
Print out the ref name given on the command line by which each commit was reached.

but, as is common, the reference manual is terse and laden with jargon here. The flag gets you some of the information you need, and sometimes it will be everything you need, but git branch --contains or git describe may still be more useful.

The answer to the trick question

Commit A is on every branch.

The trick here is that in Git, many commits are on many branches simultaneously. Some commits may be on no branch. This gets us into a separate Git question, which is: What exactly do we mean by "branch"? The word branch in Git is actually ambiguous, and overused, sometimes to the point where it nearly loses all meaning. Once you get used to the crazy multiple meanings, though, it turns out that humans usually assign the right meaning automatically: a branch is a branch name, but it's also a remote-tracking name, a particular commit that Git calls more formally a tip commit, and a set of commits ending at the tip commit. A Git branch is all of these things, and yet, when a human says "branch", they usually mean only one of these things.

To make any sense out of this, we need the concept of reachability. Reachability is actually a graph-theory thing. The diagram you drew is a commit graph, with the letters A through F standing in for actual commits. Each actual commit has some unique, big and ugly and random-looking hash ID, but those are too hard for humans, so we mostly ignore them whenever we can, or use substitutes like these letters A through F here.

Each commit links backwards to a previous or parent commit. Here, commit C links backwards to commit B, which links backwards to commit A. Commit D links backwards to A as well, and so does F; E links backwards to B, which we already noted links backwards to A.

By following the backwards-pointing links, Git finds the commits. Git finds the end commits—the branch tip commits—using the branch names, which are what humans tend to care about and use. But then Git works backwards from there.

When we start with, say, br1, Git will find commit D, then work backwards and find commit A. This means commit A is "on", or "contained in", branch br1. But we can also start with br2 and find A, and we can start with br3 and find A, and so on. Indeed, since A is our very first commit, all roads lead to ~~Rome~~ A: commit A is on every branch. It will be on future branches too.¹

It is literally impossible, in Git, to know which branch a commit was created on unless you record that as text in the commit message. That's because we can create and destroy branch names at will: each branch name simply selects (or "points to") some commit in the commit graph. We pick this commit at the time we create the branch name.

Then, when we check out (switch to) the branch and make a new commit, Git makes the new commit such that it points backwards to the commit we had checked out, and stores the new commit's hash ID into the branch name so that the new commit is now the tip commit. So, given your diagram, if we git switch br3 and make a new commit, the name br3 will point to our new commit G afterward; G will point backwards to C; and commit A remains on every branch.

If we delete branch name br1 entirely, commit D becomes un-findable, because we find the commits using branch names and working backwards. There's only one way to find D right now, and that's to use br1. So by deleting the name br1, we "lose" commit D. It becomes unreachable.²

So reachability means "how we get there". We get to commits from branch names. For much more on this concept, see Think Like (a) Git.

¹It is possible, in Git, to create more than one root commit, and hence set up new branches that don't lead back to commit A. But that's not very typical and we won't cover it here.

²Git will eventually discard an unreachable commit. You do, however, get a grace period to get the commit back, typically a minimum of 30 days. The problem is that you must find the commit's unique hash ID, which you would do using the branch name, but now that the branch name is gone... well, that's the dilemma.

Reachability, `git branch --contains`, and `git log --source`

Now that you understand reachability, git branch --contains will make sense. You give git branch --contains some hash ID, e.g., the hash ID of commit B or E or A. What git branch --contains does is:

starting from every branch name, work backwards;
if this reaches the commit, print the branch name

so when used with the commit hash ID B this will print br2 and br3, as those are the two branch names that can reach B.

The --source option to git log simply prints whichever name git log was using at the time it found some commit. This is actually more complicated to explain, because git log itself is pretty complicated!

What git log does is walk the graph, printing some of the commits it encounters as it goes. That is, we give git log some number of starting points, such as one or more branch names or commit hash IDs. The git log command takes these names and resolves them to hash IDs, or takes the hash IDs (which are already hash IDs), and finds the named commits. It puts each commit into a priority queue.

If we run git log with no arguments, git log uses the special name HEAD. This name is normally attached to one branch name. Using git switch or git checkout, we control which branch name HEAD is attached-to; that's the branch that gets extended when we make a new commit, so it's pretty important! That branch name is the current branch, and that's what git log shows by default: that is, running git log with no arguments means git log resolves HEAD to the current commit's commit hash ID, and puts that (single) hash ID in the queue.

Now that the queue has some commit or commits in it, git log takes the front entry off the queue. Since the queue is a priority queue, there's a sorting order, if there's more than one entry in it. But it's extremely common for the queue to have just the one entry! For instance, if we run git log with no arguments, the current commit is the one entry in it when we start. If we run git log br1, Git puts F's hash ID into it, and again there's just the one entry.

Anyway, having taken the front entry out of the queue, git log now decides, based on any arguments you gave like --no-merges or whatever, whether to show this commit. If it's supposed to show the commit, it does that. We call this visiting the commit, as though we're on holiday and going to certain attractions or cities or whatever.

Next, having shown or not shown the commit, git log finds the parent or parents of the commit. In your sample graph, each commit has exactly one parent, except for commit A which has no parent. (A merge commit, if there were any, would have two parents.) By default, git log puts all the parents into the queue, unless those parents have already been visited.

With its one parent, if we've just visited F, git log would put F's parent A into the queue. The queue was empty—F was the only thing in it at the start of all of this—so now there's again just one entry in the queue. The git log command now takes out and visits the one commit in the queue, i.e., commit A. It shows commit A, if it's supposed to do that, and then puts A's parents into the queue. There are no parents, so this puts nothing in the queue, and the queue remains empty.

Once the queue is empty like this, git log quits. So by starting at F via name br4, we visit commits F and A and stop, and that's what git log would show.

If, on the other hand, we run git log --all, the code will put D, E, C, and F all into the queue. There are now four entries so the priority really matters. This priority causes git log to sort its output. The default sort is based on the stored committer date in each commit, with later commits being higher priority. So if commit F is the latest commit, that's the one that surfaces first.

We'll visit F, printing it out and putting its parent A into the queue: the queue now contains A, D, E, and C (in date order). Let's say that E has the next-highest-priority date: git log will pop E out of the queue, visit it, and insert B into the queue. Then git log will take the highest priority commit out of the queue—let's continue the theme and say this is D—and visit that one. This would put A into the queue, but it's already there; it doesn't go in twice. We now visit C, which wants to put B in the queue, but it's already there; we then visit B, which wants to put A in the queue, but it's already there; and we visit A, which is the last thing in the queue and puts nothing into the queue, and so git log finally stops.

The --source flag simply annotates each output, for any given commit, with the name that first led Git to this commit. So for C, that's br3.³ For B, that's either br2 or br3, depending on whether git log visited C or E first.

The visiting order depends on the priority order. You can control this, to some extent at least, with options like --topo-order or --author-date-order. But in a big graph, especially one with a lot of branch-and-merge action in it, it's very difficult to know which of many names might first reach some commit. Only in small and simple graphs like yours here will you get something predictable.

³With git log --all you will see refs/heads/br3 rather than just br3. That's simply the full name of the branch. All branches have short names like br3, and full ones like refs/heads/br3. I like to think of the full name as what their mom (or spouse) says when she's mad at them, kind of like Stella Mudd in these ST:TOS clips.

Branch names don't matter

At the top, I asked why you care which branch(es) some commit is "on". Sometimes you will actually care, and then asking the question which branches contain this commit is fine. But if all you want to do is see the file, or see it as changes, just tell Git to show you the file, or show you the changes:

git show a123456:path/to/file

or:

git show a123456 -- path/to/file

The former shows the contents of the named file as stored in the named commit. The latter takes the named commit (using the abbreviated commit hash a123456), finds its parent (singular⁴), and runs git diff on the two commits. Then, because of the -- path/to/file pathspec at the end, it shows only the diff for that one file. So you'll see what changed in that one file, in that one commit, with respect to its parent.

You can even extract the entire file from that commit, overwriting the current working tree copy, with git restore:

git restore --source=a123456 --worktree -- path/to/file

Of course, you should first make sure you don't have anything valuable in path/to/file, because the copy that's in your working tree is not in Git and Git cannot get it back after you tell Git to overwrite it. Only committed files are actually stored in Git.

This—the saved copy of the file, or the changes from the parent—is usually all you care about. These are easy to get once you have the commit's hash ID. That hash ID is the "true name" of the commit: it always works to identify that one particular commit.

The point of a branch name, in Git, is just to help you find commits. The hash IDs are their real names. They're just too ugly to deal with: we have to use mouse cut-and-paste or whatever, once we have found them. But if you have run a command that printed the hash ID of the commit you care about, just grab that with your mouse and get to work!

⁴A merge commit, which has two or more parents, causes a problem here. Each commit holds a full snapshot of every file. So to see what changed in some commit, we have Git use its backwards-pointing link from commit to parent. The parent also has a full snapshot, so the parent commit contains the same file, unless the file itself is all-new. Git can then extract the file from each commit and compare the two files and tell you what changed.

But a merge commit has two or more parents. That's what defines it as a merge commit in the first place. Since it does have at least two parents, we no longer know which parent commit to use to get the "earlier" version of the file. There are two or more earlier versions! The git log command, when used as git log -p to show commits as patches, cheats by default: it just does not bother to show anything at all. The git show command works harder by default, doing something Git calls a combined diff. We won't go into any detail here though.