-1

I'm curious whether the following are equivalent

Let's say latest master has head C and my branch is based off C and has one commit D.

origin/master | A -> B -> C
                           \
          foo |             D

Then let's say master changes.

origin/master | A -> B -> C -> E
                           \
          foo |             D

I'm curious whether all of

  • git pull origin/master
  • git fetch && git merge origin/master
  • git reset --soft HEAD~ && git stash save && git fetch && git reset --hard origin/master && git stash pop

are expected to be equivalent and whether the algorithms that git runs for each are logically equivalent.

Questionaire
  • 287
  • 1
  • 8

1 Answers1

3

All your arrows and branch labels are misleading, because they are all quite sensible. Git, however, works backwards. :-) Let's draw them the other way, the way Git does them:

A <-B <-C <-E   <-- origin/master
         \
          D   <-- foo

That is, the name origin/master contains the hash ID of commit E. Commit E contains the hash ID of commit C, which contains the hash ID of commit B, which contains the hash ID of commit A. Commit A has no other hash ID, because it's the first commit and can't have a parent, so it points nowhere.

All "interior" arrows point backwards. They have to, because commits, like all Git objects, are read-only once created. We know, when we create a new commit, what its parent hash ID is, but we don't know, when we create it, what it's child or children will be, if and when they are ever created. As a result there's no need to draw in the parent arrows themselves; we can just connect the commits, remembering that they point backwards.

Branch names, on the other hand, move all the time. So it's a good idea to keep the branch-name arrows. Let's add in the name master and an arrow, and note that master is the current branch (HEAD) as well:

        E   <-- origin/master
       /
A--B--C   <-- master (HEAD)
       \
        D   <-- foo

git pull origin/master

This isn't quite a valid Git command. The actual command is the peculiarly-spelled git pull origin master.

If you are a newcomer to Git, I recommend avoiding git pull entirely for a while. I think it mostly confuses people. All it really does is run two other Git commands for you: git fetch (passing on the rest of the arguments you gave it, if any, or a remote-name and branch-name it extracts from the current branch if not), followed by (normally) git merge.

Once you are familiar with the other Git commands and know what to expect from them, you can start using git pull as a convenience, provided that you find it convenient (sometimes it is, sometimes it's not).

Let's look instead/first at git fetch. What git fetch does is call up another Git and ask it about its branches and tags.

This second Git has its own, independent master. Your Git finds out which commit hash their Git is identifying by their master. Your Git then obtains that commit by its hash ID. The hash ID is the "true name" of the commit—a name like master is just a moveable pointer, containing a hash ID, and the hash ID that your master, or their master, has, changes over time.

If their master names commit E, and you already have commit E, your Git does not have to download commit E. Your Git simply changes your own origin/master to point to commit E (which is no change at all, if it already points there).

If you don't have commit E yet, your Git gets it from their Git. (Along with commit E, your Git gets anything they have, that you need, that you don't already have—such as commits C, B, and/or A and/or all the tree and blob objects any of those might need. You will usually have most of these already, but whatever you don't have, they will package up and ship to you, so that your Git can set your origin/master.)

If their master names some other commit (any of A through D, or some commit you don't have yet), your Git will download whatever it needs so that it has that commit and all its auxiliary data and other reachable commits, then make your origin/master point to that commit by its hash ID. I'll assume for now that their master still points to E, though.

That's the end of all the work for git fetch: it obtains the various objects, and then updates your remote-tracking names (your origin/* names). Well, there's one more thing it does, of historical interest: it writes every name it fetched to .git/FETCH_HEAD. If you run git fetch, it will default to fetching all the branch and tag names from origin; if you run git fetch origin master, you tell it to fetch only one name, the one matching master (hence branch master), from the other Git that you call origin.

git fetch && git merge origin/master

After running git fetch origin master, git pull origin master will, in effect, run git merge origin/master. It does so via the special FETCH_HEAD file, rather than by literally running git merge origin/master—but git pull origin master and git fetch && git merge origin/master will, in this case, do the same thing.

Note that git fetch is the unrestricted form: update all remote-tracking names. If you're not currently on your own master, or your master has a different upstream setting, git pull will run git fetch origin some-other-name, but git pull origin master will explicitly run git fetch origin master. Then it will run git merge with a hash ID extracted out of .git/FETCH_HEAD (and a -m argument as well). So there are a lot of differences here, but most are usually minor, assuming you're on master with its upstream set to origin/master.

The git merge step is a fair bit more complicated. This:

  1. Checks whether the index and current (HEAD) commit match, or if not, whether the merge looks safe. Ideally they should match (you should have run git commit if not). It's tricky to back out of a failed merge if the index and the HEAD commit don't match (although git merge --abort will do its best).

  2. Uses the current commit's hash ID and the merge target commit's hash ID to locate two specific commits. Since HEAD names master and master points to C, the current commit is C and the target is E. Git doesn't have a single consistent name for the target commit; I like to call the HEAD commit L for left/local/--ours and the other one R for right/remote/--theirs. It won't matter much here, though, as we'll see in a moment.

  3. Computes the merge base of the L and R commits. The merge base is, simply put (somewhat too simply in hard cases), the first place the two branches come together when we start at both L and R and work backwards.

    In this case, that's commit L (aka C) itself!

  4. If there is no common ancestor merge base commit, fail (in modern Git). If the merge base is not one of the two L and R commits, do a true merge. If the common base is R, do nothing: there is nothing to merge. If the merge base is L / HEAD, do a fast-forward operation if allowed. If not allowed, resort to a true merge.

Since the merge base is L, and you did not say --no-ff, Git will use the fast-forward operation for this particular merge. The result will be to check out commit E and move the name master to point to E:

        E   <-- master (HEAD), origin/master
       /
A--B--C
       \
        D   <-- foo

Finally:

git reset --soft HEAD~ && git stash save && git fetch && git reset --hard origin/master && git stash pop

This one is much more complex.

A soft reset using HEAD~1 tells Git to:

  1. Find the current hash ID by reading .git/HEAD. This will normally contain a string like ref: refs/heads/master, which tells Git that the current branch is master. If you're in "detached HEAD" mode, .git/HEAD will have a raw hash ID in it, rather than a branch name; this affects step 4 below. Otherwise, read the branch name itself to find the hash ID.
  2. Read that commit's parent hash ID (HEAD~ means HEAD~1 which means "one parent back along the first-parent line of ancestry").
  3. Don't touch the index (--soft), and don't touch the work-tree (--soft or --mixed).
  4. Write the new hash back into the current branch. Or, if HEAD is detached, write the new hash directly into .git/HEAD.

Since we have not touched the index and work-tree, they remain unchanged, regardless of whether we had a branch name to rewrite in step 4. Assuming that HEAD names master, and that the index and work-tree match commit C (to which master points), this soft reset will change the name master to point to commit B, leaving the index and work-tree matching the contents of commit C.

Next, git stash save writes two commits, not on any branch. One contains the contents of the index, and one contains the contents of the work-tree. (It doesn't matter that these two match each other, or that they match commit C for that matter—that just means that the two new commits use the existing top level tree object from commit C, which saves space.) The resulting diagram now looks like this:

       E   <-- origin/master
      /
     C--D   <-- foo
    /
A--B   <-- master (HEAD)
   |\
   i-w   <-- refs/stash

(I call the i-w commit clump, to which refs/stash points, a stash bag, because it hangs off the commit that was current when you ran git stash save.)

The git fetch step now does whatever it does, possibly adding more commits and/or moving origin/master to point somewhere. We'll assume here that it leaves origin/master pointing to commit E.

The git reset --hard origin/master now turns origin/master into a hash ID. This was step 1 above in our earlier git reset, but this time we don't read .git/HEAD, we just read the value of origin/master:

git rev-parse origin/master

Note that we can do the same to compute HEAD~1:

git rev-parse HEAD~1

At any time, git rev-parse can turn a name into a raw hash ID, whenever that's what we need. For git reset, that's what we need: what commit are we resetting to?

The git reset now writes that hash ID into master, and this time, because we used --hard, writes that commit's tree into the index and updates the work-tree to match. While the index and work-tree are not in the diagram, we now have this:

       E   <-- master (HEAD), origin/master
      /
     C--D   <-- foo
    /
A--B
   |\
   i-w   <-- refs/stash

(we could draw the A-B-C-D line horizontally here, or go back to having D down one row except for the refs/stash in the way).

Last, the git stash pop takes whatever is in the w commit and tries to merge it, using git merge-recursive, with commit B as the merge base, the current index turned into a tree as the L tree—since we just git reset --hard to commit E, that's E as L—and the saved w commit as R. This merge may, depending on what has happened since commit B, see that there is no work to be done, and do nothing.

If it does nothing, or does something and thinks the merge succeeded, it drops the stash:

       E   <-- master (HEAD), origin/master
      /
     C--D   <-- foo
    /
A--B

It does not make any new commit, so the index and/or work-tree may now differ from the snapshot in commit E, if the merge did some work.


There are a number of important things to note here:

  • git pull really is git fetch followed by a second Git command. The syntax for git pull is odd, and either of the two sub-commands it runs can fail, although a failure of git fetch is unlikely (and generally pretty harmless except for stopping the pull). A failure during git merge is common and requires manual intervention to complete or abort the merge. It's a good idea to know what you are doing here, including whether you're in a git merge that needs help; and to know that, it's good to run git merge yourself the first however-many times.

  • git merge itself is quite complicated. It can do a fast-forward, which is not a merge at all (and never encounters merge conflicts). It can do nothing at all. Or, it can do a real merge, which can fail with merge conflicts. To find out what it will do, you must find the merge base, which requires looking at the commit graph (git log --graph). Some of the clicky web interfaces, such as those on GitHub, hide the graph from you, and make it difficult or impossible to tell what will happen.

  • git stash is also quite complicated internally. When all goes well, it seems simple, but when it fails, it fails rather spectacularly.

  • git reset has too many modes of operation to make it easy to use. With --soft, --mixed, and --hard, it works one way, and the three options just tell it when to stop working: after moving the current branch, or after resetting the index, or after resetting both index and work-tree. With other options, it works another (different) way.

  • Using git stash for anything complicated is tricky. All it does is make commits anyway, so if you are doing something complicated, just make a commit that you can see and work with. You can remove it later with git reset with --soft or --mixed.

Community
  • 1
  • 1
torek
  • 448,244
  • 59
  • 642
  • 775