1

A git pull is a git fetch then a git merge. But, specifically, why would one ever git merge without doing a fetch first? i.e. why would it be possible to break one seemingly unitary operation into two sub-operations?

Markus
  • 1,020
  • 14
  • 18
  • 3
    To merge something they already have locally and don't need to fetch from the remote? – jonrsharpe Sep 23 '20 at 11:00
  • where would the local something come from? – Markus Sep 23 '20 at 12:09
  • For example if you created a branch to work on a specific feature, finished it and wanted to merge that work back into the trunk. You can do that all without touching a remote. – jonrsharpe Sep 23 '20 at 12:14
  • would you not still have to fetch from your feature branch and merge from it? – Markus Sep 23 '20 at 12:44
  • https://stackoverflow.com/questions/292357/what-is-the-difference-between-git-pull-and-git-fetch just in case anyone thinks a git pull is not fetch + merge. I don't understand why the downvotes. Nobody can answer it's too hard lol. – Markus Sep 23 '20 at 13:03
  • 1
    Not if the branch is already in your local repository, you wouldn't; fetch is for getting any changes you don't have locally from the remote: https://git-scm.com/docs/git-fetch – jonrsharpe Sep 23 '20 at 13:04
  • Does this answer your question? [Git merge branch into master](https://stackoverflow.com/questions/14605231/git-merge-branch-into-master) – Joe Sep 23 '20 at 23:34

2 Answers2

7

(I see this is rather downvoted already, but as a sort of philosophical question, I think it makes sense. It is a hard question though!)

A git pull is a git fetch then a git merge.

Technically, it's a git fetch followed by a second Git command. You can choose which second command to run. The usual two are git merge or git rebase. (Sometimes—rarely—git pull will end up running git checkout as its second command, but that is not one you can choose with an option. That's just what happens if you run git pull in a totally-empty repository. I have not checked to see what happens on an orphan branch, but that's also a reasonable place for git pull to run git checkout, since that's the same situation as in a new, totally-empty repository.)

But why would one ever git merge without doing a fetch first?

Why would you write code? Why do anything at all? In one sense, the answer is just because I want to. So why merge at all? Because you want to. But that just pushes the question down another level: why do you want to? Why do you want to run git merge at all?

To answer that question, we should look at what git merge can do. From a high level overview—as in, what are the outcomes of running git merge—there are four possibilities:

  • Nothing at all: git merge says there is nothing to merge, or the argument(s) we give it are invalid, or are in error, or whatever. That's probably not why we wanted to run git merge, but it's a possibility, so we have to keep it in the list.

  • We get what Git calls a fast-forward merge, which is not actually a merge at all. That might be what we want! Or it might not. I won't go into any detail here other than to say that a "fast-forward merge" is really more of a check out the commit we named in the merge command, and drag the branch name forward operation here.

  • We get one or more merge conflicts, and the merge stops in the middle. That's probably not what we want, but it might be a stumbling block on the way to getting what we do want.

  • We get a merge commit, which is otherwise like any ordinary commit—a new commit that adds to the current branch—except that it has an extra parent commit. This means that when we view the history in the repository—remember, "the history in the repository" consists of the commits that are in the repository—we'll have a fork in the view, so that we can continue traveling into the past down the "main line" of this branch, or we can travel down the other leg of the merge, into that past.

That last one—getting the final merge commit—is probably why we ran git merge. In fact, if we want to make sure we get a merge commit, even if git merge could do a fast-forward instead, we should run git merge --no-ff.

Sometimes, though, getting that fast-forward is what we want. In that case, we should use git merge --ff-only. If a fast-forward is not possible, this will produce an error and fail the merge, rather than producing a merge commit.

That still doesn't answer the question

As before, this just pushes the question down another level: why do we want, or not want, a merge commit? We did, at least, see what a merge commit is: a fork in history. Remember that while we work forwards, making new commits, Git works backwards. So when we combine two histories, the backwards view splits one history into two.

To see why we might want to split history like this, we have to think backwards like Git does. Remember that each commit holds a snapshot, and therefore, to get changes, we must compare that snapshot to some earlier snapshot. With an ordinary non-merge commit, this is easy:

...--P--C--...

Here C is the child commit of parent commit P. There is only one way to go, from C back to P. The change in C is whatever is different between the snapshot in P and the snapshot in C. That's what git show hash-of-C will show us, or what git log -p will show us.

At a merge commit, however, there are two parents:

...--P1
       \
        C--...
       /
...--P2

To see what happened between P1 and C, we have Git compare these two snapshots. To see what happened between P2 and C, we have Git compare these two snapshots. So if we want to be able to see both sets of changes, we need both parents recorded.

That's one possible motive for getting a merge (a merge commit), which would in turn motivate us to run git merge. Now let's look at another.

How Git finds commits

Commits in Git are numbered, but the numbers look random. They're not actually random at all: each number is a cryptographic checksum of the contents of the commit. That way, each one will be unique.1 Git then stores these commits—and other internal Git objects, which are similarly numbered2—in a key-value database. So the way Git finds a commit is by its hash ID. If you memorize each hash ID in your entire repository, you can just supply the hash ID to get the commit back.

The thing is, nobody3 wants to memorize hash IDs. Fortunately, there is no need to do so. Each commit already stores the hash ID of its parent or parents, so all we—and Git—need is the ability to store the hash ID of the last commit in some chain of commits. We do this with a branch name:

... <-F <-G <-H   <-- branch

The branch name stores the hash ID of the last commit in the chain, which in this case is represented by the letter H. We say that the branch name points to the commit. Meanwhile commit H stores the hash ID of its parent G, so we say H points to G. Similarly, G points to F, and so on.

This gives us the entire history of the branch, starting from the end and working backwards. With just one name—a branch name—we get the entire chain of commits. We need only remember the name of the commit we'd like to start with. And, as we add new commits to a chain, using git commit, Git automatically updates the name, so that:

...--G--H   <-- branch

becomes:

...--G--H--I   <-- branch

and the fact that there's now a new last commit never needs to bother us at all.

That's a motive for branch names, and not—yet anyway—a motive for using git merge, but let's keep that in mind, and move on.


1This assumes that each commit's content is unique to that commit. Fortunately—and/or through careful design—it is. It also assumes that the pigeonhole principle does not apply. Unfortunately, it does. Fortunately, it doesn't ever accidentally happen in practice. Unfortunately, it can happen deliberately. Fortunately, this known collision doesn't affect Git. Meanwhile, Git is moving to a new hash algorithm.

2These objects need not be unique: for instance, two commits that, for whatever reason, store the same snapshot, can literally just share the snapshot. File contents are stored as internal objects as well, and this automatically de-duplicates the files.

3Well, nobody I know, anyway. I don't even want to memorize an important hash ID like that for the empty tree.


If merges split history when going backwards, what about forwards?

Suppose we have a chain ending at H like this:

...--G--H   <-- branch1

and we add a second branch name, branch2, that also points to commit H. Then we make two new commits on branch1, so that we have:

          I--J   <-- branch1
         /
...--G--H   <-- branch2

Now we make two new commits on branch2, so that we have:

          I--J   <-- branch1
         /
...--G--H
         \
          K--L   <-- branch2

When viewed backwards by starting at the two "last" commits J and L, these two histories join at commit H. If that seems significant or propitious, well, now you're starting to understand Git.

In order to come up with the snapshot for a new merge commit, a git merge of these two commits, J and L, will find commit H. It does so using these backwards-pointing arrows, by following the history from J to I to H, and from L to K to H. Now that git merge has found the best common starting point, it can compare what's in H's snapshot to what is in J's, to see what we did, and to what is in L's, to see what they did.

This action, of comparing the merge base commit to two other commits, is the to merge or merge as a verb part of a git merge. The fact that Git can do this on its own, and often pretty successfully, is both amazing4 and useful. It is this merge as a verb action that makes git merge truly useful.


4Git is just applying some simple diff-combining rules, which are purely textual. Git has no concept of the semantics of the text that it is combining. Once you realize this, it starts to make sense why Git can combine many kinds of source code, but not—for instance—a lot of XML text.


We're not quite done with true merges

The ability for git merge to leave behind a merge commit is useful for history purposes, but in some ways this just an adjunct to the merge-as-a-verb action, because it leaves the graph set up such that the next git merge operation has a reasonable merge base. To see how that works, consider what you get if you're working on two separate branches, and occasionally merge one into the other. Let's start with an "after" picture, and then get the "during" pictures that led to this:

            o--o--o--o--o--o   <-- feature/short
           /       \
...--A----F----M----N---P   <-- develop
      \       /        /
       o--o--B--o--o--C--o--o   <-- feature/tall

Here, we have a branch named develop on which we might write an occasional hotfix commit (such as commit F), but we don't write new features that mess with the actual use of the system. So here's our first "during" picture:

            o--o   <-- feature/short
           /
...--A----F   <-- develop
      \
       o--o--B   <-- feature/tall

As we work on the new features, though, we realize that some of what we're doing is just preparation work that really should be in the development line, or is ready, or whatever. We're at commit B on the feature/tall branch now, and we decide that this stuff—everything we did on o-o-B—should go into develop right now. So we run:

git checkout develop
git merge feature/tall

If this merge goes well, Git makes new merge commit M on its own:

            o--o   <-- feature/short
           /
...--A----F----M   <-- develop
      \       /
       o--o--B   <-- feature/tall

The merge base commit that Git used to make M was commit A; the input commits were F—the hotfix we kept—and B. Now that commit M is done, we keep working on feature/tall:

            o--o   <-- feature/short
           /
...--A----F----M   <-- develop
      \       /
       o--o--B--o--o--C   <-- feature/tall

Meanwhile, we—or someone, anyway, working on feature/short—has found that they should make merge commit N, which gives us:

            o--o--o   <-- feature/short
           /       \
...--A----F----M----N   <-- develop
      \       /
       o--o--B--o--o--C   <-- feature/tall

When we go to merge feature/tall—or more precisely, commit C—into the tip commit of develop (commit N), Git will work backwards from N to M, then to F and B both. In other words, commit B is on the develop branch, through the merge commit. Git will also work backwards from C, through the two o commits, to B, which is therefore the best shared commit. So the next merge-as-a-verb process only has to get our o-o-C changes into develop, while keeping the M-N changes (with M's "changes" being reflected directly through the B-vs-N comparison: they are basically just keeping the hotfix).

Once we're done with a feature, we merge it in one last time, and then, because the final merge R has D as a parent, we can drop the name feature/tall entirely. If we need to find commit D, we'll do that by looking at the second parent of commit R:

            o--o--o--o--o--o   <-- feature/short
           /       \
...--A----F----M----N---P--------R   <-- develop
      \       /        /        /
       o--o--B--o--o--C--o--o--D

That this all works so well (or as well as it does) is why we use git merge. The diff-combining plus some basic graph theory get us pretty far.

I may have convinced you about merging, but what about git fetch?

If we all agree that git merge is useful, why wouldn't we always run git fetch first? Well, before we answer that, we need to ask why would ever run git fetch, much less just before running git merge. To understand that, we need to consider the fact that Git is a distributed version control system.

We get a copy of some repository, and work in our copy and make new commits. Someone else may control the original repository—the one we copied—and may be making new commits there. Or, perhaps that original repository is one hosted on a centralized server type site, such as GitHub or Bitbucket or whatever, and one or more people are sending new commits to it.

If we're dealing with this situation, then it makes sense to coordinate with other users. That is, if we're using Git collaboratively and we want to get someone else's new commits, git fetch is a good way to do that.

As soon as we introduce this extra repository, though, we throw in a lot of complications. In particular, commits get shared: they have universe-wide unique hash IDs, so any two Gits, at any time, can join up with each other temporarily, and just show each other their hash IDs. If one Git has a hash ID and the other does not, that one Git has a commit or other internal Git object that the other lacks. The git fetch and git push commands give us ways to connect a pair of Gits and have them transfer commits to each other. So commits, which always have a unique number, are easy to share this way. But here's the problem: branch names aren't shared this way, at all.

To see why branch names aren't shared, just imagine that you and a friend or co-worker are both doing work and plan to collaborate. Let's give the two people involved here names. We'll use the standard names, Alice and Bob, and we'll talk about a setup with three repositories:

  • Alice has Alice's repository;
  • Bob has Bob's repository; and
  • both of them share with each other using a central-site third repository like GitHub or Bitbucket.

Both Alice and Bob start with git clone url. This gets them a repository with some commits in it. Now, there's a secret5 here: when their git clone is not quite finished yet, neither Alice nor Bob have any branches at all. Instead, they have what Git calls remote-tracking branch names, which I call remote-tracking names because they're not actually branch names.

If you (or Alice or Bob) run git clone -b branch url, what you are doing is directing your Git to run git checkout branch as the last step of its clone operation. It's this final step that actually creates your branch name. If you omit the -b option, your Git asks the other Git—the one at url—what branch name to use. The usual default right now is master.6

It's this final step that actually creates your own branch in your repository. So Alice gets a master and Bob gets a master, and both of their masters are based on their origin/master which itself is based on the master that's in the third Git, at the central server.


5It's not really a secret, but it's typically overlooked in introductions to Git.

6GitHub plan to change this to main soon, for new repositories; existing repositories will still recommend whatever they recommend. GitHub provide a web interface to let you adjust any repository you control.


Let's draw the Alice-and-Bob situation

We now have three repositories, which we—as some sort of omniscient deity can somehow know all about all the time—so we'll draw a picture of all three:

central-server:  ...--G--H   <-- master

alice: ...--G--H   <-- master, origin/master

bob:   ...--G--H   <-- master, origin/master

Now Alice, being faster than Bob, makes a new commit:

                 I   <-- master
                /
alice: ...--G--H   <-- origin/master

Bob makes his commit second; since commits have unique IDs, we'll call his J:

bob:   ...--G--H   <-- origin/master
                \
                 J   <-- master

Since we have this omniscient overview, let's draw what would happen if we combined all the commits into one repository:

                 I
                /
[all]  ...--G--H
                \
                 J

What names should we use to find commits I and J? One name, like master, is only allowed to remember one commit. So there's no right name to use here.

For Alice and Bob to co-operate, one of them must send his or her commit back to the central server, using git push. The git push command is not as clever as git fetch.7 It works by sending a commit to some server,8 and then asking the server's Git to set its—the server's Git's—branch name to remember that commit. So if we assume Alice gets there first again, we have this global view:

central-server:  ...--G--H--I   <-- master

alice: ...--G--H--I   <-- master, origin/master

bob:   ...--G--H   <-- origin/master
                \
                 J

(I've linearized the drawing a bit to save space: put I to the right of H).

Bob simply doesn't have commit I yet. He must run git fetch to pick up I from the server. After that, he gets:

bob:   ...--G--H--I   <-- origin/master
                \
                 J   <-- master

That is, his Git now knows that origin/master should identify commit I, and his commit J is only on his master.

If Bob tries to git push his commit J, he'll ask the server to set their master to point to J. They will refuse because if they do that, they will lose their copy of commit I. This happens regardless of whether Bob knows that commit I even exists: the central server knows, and that is the machine on which a Git that is doing checking, is doing the check.

Since Alice beat Bob to the punch, it's now Bob's job to decide what to do about the fork in history between Alice's commit I and Bob's commit J.


7The concept of remote-tracking names didn't exist in early Git, and the extra cleverness was only added into git fetch, because it doesn't really make sense in git push: server repositories often have no person attending them all the time, so there's no one to take advantage of it.

8To receive a git push, a site has to provide some sort of authentication and access control, because Git doesn't. Providing authentication and access control is enough to call the system a server. You can git fetch from a client that doesn't have all this stuff in it, and Alice and Bob could do peer-to-peer Git without bothering with a central server at all, if they like, by using git fetch to communicate with each other. But that requires that they put up a read-only service that lets the other grab commits without first authenticating. Or, if they have a nice enough and secure enough system, they can just offer ssh or web service on their own system directly. It's kind of a pain though, which is why services like GitHub are so popular.


Now we can see why Bob wants to git fetch

Bob wants to run git fetch now because:

  • Bob wants to cooperate with Alice, not run roughshod over her;
  • Bob has seen a git push failure, or Bob has preemptively run git fetch to avoid seeing a git push failure. Either way, Bob now knows that commit I exists.

Bob can now run git merge, or git rebase, or any other Git command, to arrange to do something about his commit J so that it fits better with Alice's commit I.

But that gives us a motive to fetch-and-merge!

Just so: Bob's situation shows us why Bob would run git fetch and then git merge, and your question is more along the lines of why Bob might run git merge without first running git fetch. Why would he do that?

Ultimately, if he did do that, we would have to ask him why. But here are some possibilities:

  • Alice doesn't exist. There is no central-server repository. Bob is working on his own.

  • Or, whether Alice exists or not, Bob created several branches on his own, making some commits that he has never given to anyone else. No one else has those commits, so no one else could possibly be using those commits for any purpose. Bob can safely merge those commits without regard to what others might have done because no one could have done anything.

  • Or, Bob got Alice's commit, and it's wrong. Bob does not want Alice's commit.

  • Or, Bob got Alice's commit, and it's right, and Bob has realized that he should have created a feature/tall branch. He can do that now instead of merging.

That last one is one of my motives for avoiding git pull

That last possibility is one reason I don't use git pull. I like to run git fetch as step 1, then look at what happened.

Depending on what git fetch fetched, I might decide to run:

  1. nothing, because there's nothing to do;
  2. git merge --ff-only, to make sure that I can do a fast-forward instead of a merge;
  3. git log (perhaps with -p), to look at new stuff;
  4. git rebase, because the new stuff I looked at in step 3 is good and mine adds on nicely;
  5. git checkout -b to make a new branch;

or something else I've forgotten to list (other than git merge, which I almost never want). So I do often run git fetch first, and then git merge --ff-only, but I often stick some other command in between the two operations.

Other people have different motivations. The point here is that while the fetch-and-merge, or fetch-and-rebase, sequence is very common and perhaps deserves to have a convenience command like git pull that does it for you, it's not the only way to work.

torek
  • 448,244
  • 59
  • 642
  • 775
  • 1
    I made myself an alias, `git mff`, that just runs `git merge --ff-only`. If I'm doing active development on something, I tend to run `git fetch` and then `git mff` and if that *fails* I know I should check to see if I should rebase. It's a shortcut—actually looking is better—but depending on how much of a rush I'm in, I find it a useful shortcut sometimes. – torek Sep 23 '20 at 23:32
  • 1
    The `git pull` command does allow you to pass the `--ff-only` flag, or even configure it in, but for some reason, I just don't *like* to do that. I'm not 100% sure why, but I can point to one fairly solid reason: sometimes I'm running as some other user (helping someone else out, or logged in to the server, or whatever) and don't have my personal setup. In that case, running `git mff` just fails and I remember that I have to spell it all out. – torek Sep 23 '20 at 23:33
  • Thanks for answering so comprehensively. I can't believe it was closed with "This question needs to be more focused". There is no simpler way to state it, and the general reasoning you provided is exactly what was called for. – Markus Sep 25 '20 at 13:48
  • Well ... this question and answer is, in a sense, an abuse of StackOverflow, which is supposed to be about *small* answers to a *specific technical question*, rather than about larger philosophical-type answers to "meaning of life" type questions (which for computer software, perhaps belong on the softwareengineering site). – torek Sep 25 '20 at 20:39
-3

Perhaps if you do a fetch without a merge for some reason and later u just merge because u know there r problems where u fetched it from?

Markus
  • 1,020
  • 14
  • 18
  • Why would this not be a valid reason to do a merge without a fetch? You have already fetched some dev from main, you know there was a problem with the code on main that happened after your fetch, so you merge without doing another fetch. – Markus Sep 23 '20 at 13:14