-1

Apologies if this is obvious i'm no git expert.

This morning i went into my local repo and did the following:

$ git fetch
$ git checkout master

And got a message that i've seen plenty of times before:

Switched to branch 'master'
Your branch is ahead of 'origin/master' by 8 commits.
  (use "git push" to publish your local commits)

The problem is that i have never made any changes to this branch whatsoever. I tried git pull but it just responds with 'already up to date'.

'Git log origin/master..master' shows several commits to the branch from other authors which do not appear in the remote branch, so the message seems to make sense, but i don't understand how i can have commits from other authors in my local copy which are not in the remote repo

How do i get to the bottom of this?

  • `git log origin/master..master` will show you the commits you have that the upstream doesn’t. – user3840170 May 07 '20 at 12:29
  • Perhaps someone force pushed to the remote and removed those commits from it, after you got them into your local repository? – Lasse V. Karlsen May 07 '20 at 13:58
  • Sorry - that was actually what i tried when i mentioned git log. I've edited my question to make clearer. What perplexes me is that i haven't commited anything with this repo since i cloned it a week ago, though i have pulled updates several times. What i'm trying to understand is how i can, therefore, have commits in my local repo from another author which are not in the remote - surely this is impossible unless someone has delete them from the remote one? – india_tango May 07 '20 at 13:59
  • Lasse V. Karlsen - thanks, thats certainly the only reason i can think of. – india_tango May 07 '20 at 14:03

1 Answers1

1

The TL;DR is that Lasse V. Karlsen's speculation in a comment:

Perhaps someone force pushed to the remote and removed those commits from it, after you got them into your local repository?

is probably the answer. Run:

git reflog origin/master

and look for the string forced-update in the output; if you see this, that's what happened.

Long: what all this means

Git is, at its heart, really all about commits. Commits are the unit of storage we work with. Each commit has a sort of "true name", universally the same in every Git, which is its raw hash ID—a big ugly string of letters and digits, such as b34789c0b0d3b137f0bb516b417bd8d75e0cb306. That hash ID means that commit, and if you have a clone of the Git repository for Git, you either have that commit (part of the upcoming Git 2.27 release) or you don't—well, not yet.

We do not, however, normally refer to commits by these hash IDs. They are too big and ugly and impossible to remember or retype. We like to use names: branch names like master, tag names like v2.26.0,1 or remote-tracking names like origin/master.

To make tags work right in all cases, we have to promise never to change the hash ID to which some tag refers. Some modern software—specifically Go modules and Go's version cache—relies on this for smooth functioning (though you can clamp down on it more for security if/as necessary).

The thing about branch names, though—as compared to tag names—is that the commit hash ID we obtain by looking up a branch name normally changes. The branch name is in fact defined, in Git, as storing the hash ID of the last commit that is to be considered part of the branch.

Commits are immutable, because their true name hash ID is a cryptographic checksum of their content.2 Meanwhile each commit contains the hash ID of each of its immediate predecessor or parent commits—usually just one, with merge commits usually having two. These form commits into nodes within a directed acyclic graph, by making each commit act as both vertex, named by hash ID, and the connecting one-way edges or arcs, named as the parents.

This graph can be—and is—added to, just by adding new commits to the set-of-all-commits in the repository. That is, we use git checkout or git switch to select some branch name as the current branch and select its last commit—the hash ID stored in the branch name—as the current commit. Then we do our work as usual, and Git packages up a new snapshot and new set of metadata to make a new commit. The new commit's parent is the current commit.3 Now that the new commit exists and is safely stored in the repository,4 Git writes the new commit's new, unique hash ID into the current branch name, so that the branch automatically points to the (new) last commit.

This is how branches grow, one commit at a time, and it provides a "normal direction" for branch names to move:

... <-F <-G <-H   <-- master

becomes:

... <-F <-G <-H <-I   <-- master

when we add new commit I, then becomes:

... <-F <-G <-H <-I <-J  <-- master

when we add commit J, and so on.

We cannot actually remove commits from this graph, but we can force any branch name to move backwards, as it were. For instance, after adding commits I and J, we can "remove" them for most practical purposes by forcing the name master to hold the hash ID of commit H again:

             I--J   ???
            /
...--F--G--H   <-- master

Since Git normally finds commits by using a name like master and then working backwards through the commit-to-commit parent links, which point backwards, Git can no longer find commit J, at least not by starting with the name master. Eventually, our Git will toss commits I-J for real—not immediately, and there are hidden names by which we can find them, as we'll see, but eventually.

Your own Git's remote-tracking names, like origin/master, are your Git's memory of some other Git's branch names. That is, when you first clone a repository, or use git fetch to update a clone, you have your Git call up some other Git, via a URL that Git saves under a name. We call the name—in this case origin—a remote and we supply the name as shorthand for the URL:

git fetch origin

Our Git calls up their Git, gets from them a list of their branch names and last-commit-hash-IDs, and gets from them any commits they have, that we don't, that we need. For instance, if we have, at this time:

...--F--G--H   <-- master, origin/master

we can call up their Git and perhaps their master now points to new commit J:

...--F--G--H--I--J   <-- master [in the Git at origin]

Our Git checks, notices we don't have a commit with hash ID J, and gets that commit from their Git. That also brings in commit I because we always get everything we don't have,5 and then updates our origin/master—our memory of their master. So now we have, in our own local repository:

...--F--G--H   <-- master
            \
             I--J   <-- origin/master

At this point, their master is 2 ahead—two commits ahead of—our master. We can now take action to add these commits, I-J, to our own master. We can have Git do that by sliding the name master "forward" to point to commit J:

...--F--G--H
            \
             I--J   <-- master, origin/master

If we do this with git merge, Git calls this a fast-forward merge. There is no actual merging involved: Git really just checks out commit J directly and updates the name master to point to commit J. (There are other ways to fast-forward any branch name as well: if the commit isn't checked out, and that is not our current branch, the name simply "slides forward".)

But as we noted above, we can force a branch name to "move backwards" so as to "remove"—well, sort-of-remove—a commit. Suppose someone does this on the other Git, after we've added I-J to our own master. Their repository now has their master pointing to commit H.6 When we have our Git call up their Git—by running git fetch origin—we see that their master names commit H, not commit J. They have no new commits for us, so our Git just updates our origin/master now, giving us this:

...--F--G--H   <-- origin/master
            \
             I--J   <-- master

We are now 2 ahead of them, as if we made commits I-J.

If we are allowed to git push origin master we can easily put these two commits back, just by running git push origin master now. If their Git has truly discarded commits I-J, well, now they're back. If not, they already have I-J and our git push just makes them fast-forward their name master to point to commit J.

Whoever "removed" commits I-J from the Git repository at origin had some reason they did it. Was it intentional? Was it a mistake? We have no way to know—well, we have one way: ask them! They know; we can only guess, so ask them.

We can, however, tell that this happened, if our Git had picked up I-J from them at some point and updated our own origin/master accordingly. This is where those hidden names I mentioned come in.


1In the Git repository for Git, v2.26.0 means commit 274b9cc25322d9ee79aa8e6d4e86f0ffe5ced925, although the name actually resolves to a tag object hash ID, adf6396efeb4e8c12fb07174b4074c4031b2c460. The whole point of the tag, though, is that we don't need to know any of this. The simple, human-readable, comprehensible name v2.26.0 suffices.

2This is true of all Git objects, actually—it's not specific to commit objects. See also How does the newly found SHA-1 collision affect Git?

3If the new commit is a merge commit, its first parent is the current commit. The remaining parents—which are what make it a merge commit; there is usually just one other commit here—can be the hash IDs of any existing commit, but each one must be the hash ID of some existing commit.

4Note that the work-tree is not in the repository—it's a separate entity that lives next to the repository, in a sense–and your files are not safe until committed.

5Except, that is, for what Git calls shallow clones. Let's ignore them here.

6Their repository may or may not still actually hold commits I-J, depending on how quickly their Git gets around to discarding unreachable commits. Reachability is a key concept here; for (much) more about this, see Think Like (a) Git.


Reflogs keep the previous values of names

Whenever our Git updates any of our names—master or origin/master or even a tag name—our Git will (optionally, but it is on by default for us) save the old value of the name in a log. This log is the reflog for the given name.7 This means that our origin/master—our memory of where their master was, updated each time we run git fetch–keeps track of where their master used to be on a previous git fetch.

Suppose someone deliberately, or somehow accidentally, forces origin's name master to move backwards—to go from commit J to commit H in the example above. Suppose further that we ran a git fetch that captured that name pointing to commit J before, and now we run git fetch and the name points to H. Our Git will forcibly update our origin/master in a non-fast-forward manner.8

We can now, after the fact, look that up in the reflog for origin/master:

$ git reflog origin/master

In my case, I can do that with my Git repository for Git, because the pu branch moves like this all the time:

$ git reflog origin/pu
dfeb8fbf42 (origin/pu) refs/remotes/origin/pu@{0}: fetch: forced-update
fc307aa377 refs/remotes/origin/pu@{1}: fetch: forced-update
5525884f08 refs/remotes/origin/pu@{2}: fetch: forced-update
...

The "forced update" part here means that they had some commits on their branch, which we were able to obtain and remember with our own remote-tracking name, but then they decided to shove those commits aside. Our Git picked up their new pu name and updated our origin/pu accordingly.


7There is also a reflog for the special name HEAD, but it has no important function in this particular case because we cannot be "on" a remote-tracking name like origin/master.

8The git fetch command reports this in its output, though many people pay no attention to it. You'll see three indicators in each line: a leading plus sign, three dots instead of two, and the words "forced update" appended:

   b34789c0b0..07d8ea56f2  master     -> origin/master
   232c24e857..5cccb0e1a8  next       -> origin/next
 + fc307aa377...dfeb8fbf42 pu         -> origin/pu  (forced update)
   139372b246..6b33381979  todo       -> origin/todo

Note how the pu branch has moved in a non-fast-forward manner, and git fetch has told us that three times.

torek
  • 448,244
  • 59
  • 642
  • 775