Git find first non-local commit

Question

Related: List Git commits not pushed to the origin yet

git rev-parse HEAD gives me the latest commit in the workspace, but this can be a locally committed githash. In other words, a commit which is not pushed to remote yet

how do I find the latest commit which also exists in remote in the workspace

Romain Valeri · Accepted Answer · 2021-07-28T19:46:01.807

To get the latest commit on the currently checked out branch's configured remote branch, do

# first get your remote-tracking branches up-to-date with remote
git fetch

# then do
git rev-parse @{upstream}
# or even just
git rev-parse @{u}

(Note : @{upstream} / @{u} are not placeholders, they're meant to be typed as is)

From the doc :

[<branchname>]@{upstream}, e.g. master@{upstream}, @{u}
The suffix @{upstream} to a branchname (short form @{u}) refers to the branch that the branch specified by branchname is set to build on top of (configured with branch..remote and branch..merge). A missing branchname defaults to the current one.

score 3 · Answer 2 · answered Jul 28 '21 at 06:21

Technically, git rev-parse HEAD gives you the hash ID of the current commit. That's not necessarily the latest, and it need not match what is in the working tree even in normal use (because the working tree can be modified and not yet committed). These points interfere with answering your question as asked, too: perhaps you don't want the latest commit. Moreover, the commit(s) in some remote repository to which you can git push are usually not in any working tree, because such remote repositories are normally bare repositories: bare repositories generally accept git push requests, and non-bare ones don't.

All that aside, what you probably want is a simple:

git rev-parse origin/master

or:

git rev-parse origin/<some-other-name-here>

or:

git rev-parse @{upstream}

The last one of these needs further explanation. The first two simply use your existing names, in your existing Git repository, to find a hash ID, in the same sort of way that git rev-parse HEAD does—though usually less complicated.

It's possible that your local Git repository is out of date with respect to the other (remote) Git repository. In that case, you might need to run:

git fetch origin

first in order to obtain any new commits that they have, and update your various remote-tracking names: the names like origin/master and origin/develop and so on.

What's going on here

Git defines a branch name as a name—like master or main, or develop, or feature/tall or whatever—that holds the hash ID of some existing, valid commit in this repository.¹ That hash ID, by definition, is the last commit "on" that branch.

What Git does with this is itself a little complicated, but if we note that most commits—all the ordinary ones²—store exactly one hash ID for their immediate parent commit, we find that we can place commits next to each other, like pearls or beads on a string:

... <-F <-G <-H ...

Here, H stands in for the hash ID of some existing commit. That commit stores the hash ID of its parent (earlier) commit, G. Commit G in turn stores the hash ID of still-earlier commit F, and so on.

Because commits can't be changed, and hash IDs are unpredictable,³ these arrows always point backwards. A branch name then just points to the last commit in the chain:

...--G--H   <-- main

Moreover, Git sets things up so that when you use git checkout or git switch to select some branch name as the current branch, the special name HEAD is attached to the branch name:

...--G--H   <-- main (HEAD)

At this point, both git rev-parse main and git rev-parse HEAD will produce the same hash ID, namely that of commit H.

Should you add a new commit, Git constructs the new commit by writing out the snapshot and metadata for that commit and making the metadata include H's hash ID, so that new commit I points backwards to existing commit H:

...--G--H   ...
         \
          I

and then, as the last step of git commit, Git writes the new commit's hash ID into whichever name HEAD is attached to, giving:

...--G--H
         \
          I   <-- main (HEAD)

The name HEAD is still attached to the name main, but the name main now says that commit I is the last commit on the branch.

Git does, however, have a mode it calls detached HEAD mode. Here, we tell Git to select some commit by something other than a branch name. For instance, we might wish to look at commit G's snapshot, and hence run git checkout hash-of-G or similar. The result is:

...--G   <-- HEAD
      \
       H--I   <-- main

The git rev-parse HEAD command now shows the hash ID of commit G: not that of commit H, not that of commit I, but that of commit H. That's because HEAD is no longer attached (to a branch name), but rather detached (meaning HEAD contains the hash ID of the commit directly).

(To get back to commit I and on branch main, we'd use git checkout main or git switch main. These will re-attach HEAD.)

¹Unless you have a damaged repository, there's no such thing as an existing but invalid commit. The idea here is to emphasize that, although hash IDs look like random garbage, you're not allowed to just make one up. They are in fact hexadecimal representations of large numbers produced by running cryptographic checksums, so they're not random at all.

²The definition of ordinary commit here is that it stores one parent hash ID. The definition of merge commit is a commit that stores two or more parent hash IDs, and Git has a third kind of commit, a root commit, that stores no parent hash ID. The root commit—there's always at least one in a non-empty, non-shallow repository—is normally the very first commit someone made for that repository. It's possible to make more root commits—by mistake or on purpose—but there's rarely a good reason to do that; it just falls out of the graph algorithms Git uses.

³To make it work like this, Git sticks something unique into each commit. In particular, each commit has a time-stamp, and in normal use it's too difficult to predict what the future time-stamp of some future commit would be. There are some theoretical means to cause problems here but they're impractical even for use as practical jokes, at least today.

Updating names, including with `git push`

Branch names are specific to each Git repository. Your Git repository holds your branch names. Some other Git repository has its own branch names.

When you create a branch name, you simply pick some existing commit for it:

...--G--H   <-- main (HEAD)

might become:

...--G--H   <-- develop (HEAD), main

You've created the new name develop and picked existing commit H for that name. If you now make a new commit I, the result includes changing the hash ID stored in develop, to produce:

...--G--H   <-- main
         \
          I   <-- develop (HEAD)

Note how, this time, it was the name develop that moved, because HEAD was attached to develop, not to main.

Anyone with direct access to a repository can, at any time, create or destroy branch names whenever they like, using git branch (perhaps with -D to delete), or git checkout -b or git switch -c. They can also create new commits at any time.

Every commit gets a unique hash ID, though. Once you've created some set of commits, you can then use git push to send those commits to some other Git. They get the entire commits—the full snapshot and metadata for each commit—exactly as is, and they compute the same cryptographic checksum, so they assign these identical commits the same hash IDs that your Git assigns them.

By using this principle, the two Gits actually manage to figure out who has which commits by looking only at the hash IDs. This is what enables the distributed nature of a Git repository. The magic is really all in the hashing.

But there's a problem. Just as your own Git finds your latest commit using some branch name, their Git finds their latest commit using their branch name(s). So if you're going to send commit I to some other Git repository, over at origin, with:

git push origin develop

from your end, they are going to have to set some branch name(s) in their repository. By convention—because humans are so easily fooled—we tend to want to use the same branch names in their repository and in our repository. So the git push above asks them to set their develop.

That's fine if develop is a new name. It's also OK for us to ask them to set their main, if we're not going to lose any of their commits. That is, suppose they have:

...--G--H   <-- main

We can ask them to set their main to point to some new commit J, as long as J points back to H eventually (probably through I):

...--G--H   <-- main
         \
          I--J   <-- request: please make "main" go here

Git calls this kind of request a fast-forward operation and generally permits it. (Many add-on sites like GitHub add fancier branch protection systems that let you be pickier; this fast-forward check is all that's built into base Git though.) What base Git won't let you do is something like this:

...--G--H--I   <-- main
         \
          J   <-- request: please make "main" go here

because if they do that, they will lose access to their commit I.

Remote-tracking names and `git fetch`

To fix this kind of problem, we're supposed to use git fetch first, before we run git push. When we run git fetch, our Git calls up their Git—as if for git push, where we'd send them our new commits—but instead of sending commits to them, we have our Git ask their Git for any new-to-us commits. They send these over—along with the information about which of their branch names point to which commits—and our Git now has any new commits they have, that we don't.

Let's assume we both had ...-G-H on our main, and they have acquired some new commit I from somewhere. Meanwhile, though, we added a new commit J *on our main. So we both started out the same:

...--G--H   <-- main

but since then, they added I:

          I   <-- (main in their Git)
         /
...--G--H

and we added J:

...--G--H
         \
          J   <-- main (in our Git)

When we run git fetch, we pick up their new commit:

          I   <-- (main in their Git)
         /
...--G--H
         \
          J   <-- main

Our Git can't update our main because if it did, we'd lose our own commit J. So what our Git does—regardless of whether they have added any new commits—is to take their branch name, main, and change it. Our Git turns their branch name into a remote-tracking name by sticking origin/ in front of it.⁴ So we end up with this:

          I   <-- origin/main
         /
...--G--H
         \
          J   <-- main

(note: add HEAD to our main if we have that as our current checked-out branch).

This git fetch step:

gets any new commits they have;
updates all of our remote-tracking names; and thus
prepares us to do anything necessary to join up new lines of commits (rebase or merge).

This means it's often sensible to follow up a git fetch with either a git rebase or a git merge. Git offers a convenience command, git pull, that combines the two operations. I dislike it, for many various reasons, and encourage those new to Git to use separate fetch and second-Git-command sequences at least until they're quite familiar with the entire process.⁵

In any case, the summary of all of this is that a remote-tracking name is Git's way of remembering what some other Git repository had in its branch name(s), the last time our Git talked with their Git. The git fetch operation tends to update all of them, and the git push operation updates one when it succeeds in doing a push to one branch. Our Git gets confirmation from their Git that they accepted our request, so our Git now knows that their Git has that name set to that hash ID.⁶

⁴Technically, the remote-tracking names are in a separate namespace, so even if we accidentally call a (local) branch origin/xyz, Git will be able to keep straight our origin/xyz branch, vs our origin/xyz remote-tracking name based on their xyz branch. But this goes back to Stupid Human Tricks, which make Bender the robot laugh; don't do that.

⁵Not everyone is so chary of git pull. Some of my dislike for it is because it had some really bad bugs in it, early on, and I lost a lot of work to git pull more than once. But the main problem is that it does too much, in my opinion. There's a movement afoot to make git pull behave better by default, although I'm not sure how likely this is to happen any time soon. If and when it does happen, I'll still recommend separate steps, but won't be as quick to suggest that newbies avoid git pull: it will be a command that, if it works, it did the right thing, and if not, there was no single right thing.

⁶Some of the automatic fetch-time updates were new in Git 1.8.4, so if you have a really ancient Git, older than this, be sure to use git fetch origin with no constraints to update everything. The git fetch that git pull runs often fails to update anything at all (another reason to be wary of git fetch), in these ancient Git versions.

Branches, upstreams, and `@{upstream}`

Each branch name is allowed, but not required, to have one (1) upstream setting. Typically, the upstream setting of a branch like main or develop is set to origin/main or origin/develop: the remote-tracking name in your own Git repository.

Having this set enables some convenience items. It's never actually required. And, when you create an entirely new branch name in your own repository, not using the remote-tracking name (which does not exist yet because the origin Git does not have this branch yet), there isn't an upstream for it yet and you will want to use git push -u origin HEAD or similar to create the branch there. That will create the appropriate remote-tracking name locally, and the -u will have your Git set the remote-tracking name as the upstream of the branch.

Once you have set an upstream, the @{upstream} suffix—it's technically a suffix that you can append to any branch name—tells Git to find that branch's upstream. That is, master@{upstream} is origin/master, assuming you have set master's upstream to the default origin/master. This repeats for each branch name.

The bare @{upstream} text, written just like that, "means" HEAD@{upstream}. So this uses HEAD to figure out which branch you're on, and then uses the branch's upstream setting to figure out which remote-tracking name to use in your own local Git repository.

All of the above is why and how RomainValeri's answer is the short version of this one. :-)

Git find first non-local commit

2 Answers2

What's going on here

Updating names, including with `git push`

Remote-tracking names and `git fetch`

Branches, upstreams, and `@{upstream}`

Linked

Git find first non-local commit

2 Answers2

What's going on here

Updating names, including with git push

Remote-tracking names and git fetch

Branches, upstreams, and @{upstream}

Linked

Updating names, including with `git push`

Remote-tracking names and `git fetch`

Branches, upstreams, and `@{upstream}`