437

I have a colleague who claims that git pull is harmful, and gets upset whenever someone uses it.

The git pull command seems to be the canonical way to update your local repository. Does using git pull create problems? What problems does it create? Is there a better way to update a git repository?

Max
  • 21,123
  • 5
  • 49
  • 71
Richard Hansen
  • 51,690
  • 20
  • 90
  • 97
  • 8
    Or you can just `git pull --rebase` and set this strategy as default for new branches `git config branch.autosetuprebase` – knoopx Mar 12 '14 at 13:06
  • 4
    knoopx has it right, adding `--rebase` flag to `git pull` synchronizes local with remote then replays your local changes on top of updated local. Then when you push all you are doing is appending your new commits to the end of remote. Pretty simple. – Heath Lilley Mar 12 '14 at 15:37
  • 4
    Thanks @BenMcCormick. I'd already done that, but the discussion regarding the validity of the question seems to be taking place in these comments below the question. And I think asking a question to create a platform to present your personal opinion as fact is not what SO's Q&A structure is really for. – mcv Mar 12 '14 at 16:19
  • 4
    @RichardHansen, it just seems like a way to cheat the point system, especially with your answer having such a drastic difference in tone and such a short time gap. Using your model of Q&A, we could all just ask questions and answer them ourselves using our previous knowledges. At that point, you should just consider writing a blog post as that is many times more appropriate. A Q&A specifically seeks other people's knowledge. A blog post exhibits your own. – Josh Brown Mar 12 '14 at 16:44
  • 1
    @JoshBrown sure, go ahead. If they are good questions (this includes being useful, actual questions, on-topic and not duplicate) and answers, what you are doing is good. Asking good questions is hard, however. SO collects knowledge, not people to answer others' questions. – John Dvorak Mar 12 '14 at 17:17
  • 2
    There is nothing against asking a question and immediately answering it, as others have pointed out, it's explicitly encouraged. However, I did edit the tone of the question since it's a bit argumentative and opinionated. Keep in mind this question has been around for a year. It's a little bit late to start complaining the OP created the question and then answered their own question. – George Stocker Mar 12 '14 at 17:49
  • 2
    @GeorgeStocker: Except this question, and more importantly, the following answer (posted at *the exact same time*) look very much like flamebaity opinion, however well explained it might be. I think this Q&A might be the best example of "begging the question" I've ever seen. – Nathan Paul Simons Mar 12 '14 at 20:50
  • @GeorgeStocker: Thanks for the edit, you improved it a lot. – Richard Hansen Mar 13 '14 at 07:06
  • I agree. Looks a lot better this way. – mcv Mar 13 '14 at 08:11

5 Answers5

575

Summary

By default, git pull creates merge commits which add noise and complexity to the code history. In addition, pull makes it easy to not think about how your changes might be affected by incoming changes.

The git pull command is safe so long as it only performs fast-forward merges. If git pull is configured to only do fast-forward merges and when a fast-forward merge isn't possible, then Git will exit with an error. This will give you an opportunity to study the incoming commits, think about how they might affect your local commits, and decide the best course of action (merge, rebase, reset, etc.).

With Git 2.0 and newer, you can run:

git config --global pull.ff only

to alter the default behavior to only fast-forward. With Git versions between 1.6.6 and 1.9.x you'll have to get into the habit of typing:

git pull --ff-only

However, with all versions of Git, I recommend configuring a git up alias like this:

git config --global alias.up '!git remote update -p; git merge --ff-only @{u}'

and using git up instead of git pull. I prefer this alias over git pull --ff-only because:

  • it works with all (non-ancient) versions of Git,
  • it fetches all upstream branches (not just the branch you're currently working on), and
  • it cleans out old origin/* branches that no longer exist upstream.

Problems with git pull

git pull isn't bad if it is used properly. Several recent changes to Git have made it easier to use git pull properly, but unfortunately the default behavior of a plain git pull has several problems:

  • it introduces unnecessary nonlinearities in the history
  • it makes it easy to accidentally reintroduce commits that were intentionally rebased out upstream
  • it modifies your working directory in unpredictable ways
  • pausing what you are doing to review someone else's work is annoying with git pull
  • it makes it hard to correctly rebase onto the remote branch
  • it doesn't clean up branches that were deleted in the remote repo

These problems are described in greater detail below.

Nonlinear History

By default, the git pull command is equivalent to running git fetch followed by git merge @{u}. If there are unpushed commits in the local repository, the merge part of git pull creates a merge commit.

There is nothing inherently bad about merge commits, but they can be dangerous and should be treated with respect:

  • Merge commits are inherently difficult to examine. To understand what a merge is doing, you have to understand the differences to all parents. A conventional diff doesn't convey this multi-dimensional information well. In contrast, a series of normal commits is easy to review.
  • Merge conflict resolution is tricky, and mistakes often go undetected for a long time because merge commits are difficult to review.
  • Merges can quietly supersede the effects of regular commits. The code is no longer the sum of incremental commits, leading to misunderstandings about what actually changed.
  • Merge commits may disrupt some continuous integration schemes (e.g., auto-build only the first-parent path under the assumed convention that second parents point to incomplete works in progress).

Of course there is a time and a place for merges, but understanding when merges should and should not be used can improve the usefulness of your repository.

Note that the purpose of Git is to make it easy to share and consume the evolution of a codebase, not to precisely record history exactly as it unfolded. (If you disagree, consider the rebase command and why it was created.) The merge commits created by git pull do not convey useful semantics to others—they just say that someone else happened to push to the repository before you were done with your changes. Why have those merge commits if they aren't meaningful to others and could be dangerous?

It is possible to configure git pull to rebase instead of merge, but this also has problems (discussed later). Instead, git pull should be configured to only do fast-forward merges.

Reintroduction of Rebased-out Commits

Suppose someone rebases a branch and force pushes it. This generally shouldn't happen, but it's sometimes necessary (e.g., to remove a 50GiB log file that was accidentally comitted and pushed). The merge done by git pull will merge the new version of the upstream branch into the old version that still exists in your local repository. If you push the result, pitch forks and torches will start coming your way.

Some may argue that the real problem is force updates. Yes, it's generally advisable to avoid force pushes whenever possible, but they are sometimes unavoidable. Developers must be prepared to deal with force updates, because they will happen sometimes. This means not blindly merging in the old commits via an ordinary git pull.

Surprise Working Directory Modifications

There's no way to predict what the working directory or index will look like until git pull is done. There might be merge conflicts that you have to resolve before you can do anything else, it might introduce a 50GiB log file in your working directory because someone accidentally pushed it, it might rename a directory you are working in, etc.

git remote update -p (or git fetch --all -p) allows you to look at other people's commits before you decide to merge or rebase, allowing you to form a plan before taking action.

Difficulty Reviewing Other People's Commits

Suppose you are in the middle of making some changes and someone else wants you to review some commits they just pushed. git pull's merge (or rebase) operation modifies the working directory and index, which means your working directory and index must be clean.

You could use git stash and then git pull, but what do you do when you're done reviewing? To get back to where you were you have to undo the merge created by git pull and apply the stash.

git remote update -p (or git fetch --all -p) doesn't modify the working directory or index, so it's safe to run at any time—even if you have staged and/or unstaged changes. You can pause what you're doing and review someone else's commit without worrying about stashing or finishing up the commit you're working on. git pull doesn't give you that flexibility.

Rebasing onto a Remote Branch

A common Git usage pattern is to do a git pull to bring in the latest changes followed by a git rebase @{u} to eliminate the merge commit that git pull introduced. It's common enough that Git has some configuration options to reduce these two steps to a single step by telling git pull to perform a rebase instead of a merge (see the branch.<branch>.rebase, branch.autosetuprebase, and pull.rebase options).

Unfortunately, if you have an unpushed merge commit that you want to preserve (e.g., a commit merging a pushed feature branch into master), neither a rebase-pull (git pull with branch.<branch>.rebase set to true) nor a merge-pull (the default git pull behavior) followed by a rebase will work. This is because git rebase eliminates merges (it linearizes the DAG) without the --preserve-merges option. The rebase-pull operation can't be configured to preserve merges, and a merge-pull followed by a git rebase -p @{u} won't eliminate the merge caused by the merge-pull. Update: Git v1.8.5 added git pull --rebase=preserve and git config pull.rebase preserve. These cause git pull to do git rebase --preserve-merges after fetching the upstream commits. (Thanks to funkaster for the heads-up!)

Cleaning Up Deleted Branches

git pull doesn't prune remote tracking branches corresponding to branches that were deleted from the remote repository. For example, if someone deletes branch foo from the remote repo, you'll still see origin/foo.

This leads to users accidentally resurrecting killed branches because they think they're still active.

A Better Alternative: Use git up instead of git pull

Instead of git pull, I recommend creating and using the following git up alias:

git config --global alias.up '!git remote update -p; git merge --ff-only @{u}'

This alias downloads all of the latest commits from all upstream branches (pruning the dead branches) and tries to fast-forward the local branch to the latest commit on the upstream branch. If successful, then there were no local commits, so there was no risk of merge conflict. The fast-forward will fail if there are local (unpushed) commits, giving you an opportunity to review the upstream commits before taking action.

This still modifies your working directory in unpredictable ways, but only if you don't have any local changes. Unlike git pull, git up will never drop you to a prompt expecting you to fix a merge conflict.

Another Option: git pull --ff-only --all -p

The following is an alternative to the above git up alias:

git config --global alias.up 'pull --ff-only --all -p'

This version of git up has the same behavior as the previous git up alias, except:

  • the error message is a bit more cryptic if your local branch isn't configured with an upstream branch
  • it relies on an undocumented feature (the -p argument, which is passed to fetch) that may change in future versions of Git

If you are running Git 2.0 or newer

With Git 2.0 and newer you can configure git pull to only do fast-forward merges by default:

git config --global pull.ff only

This causes git pull to act like git pull --ff-only, but it still doesn't fetch all upstream commits or clean out old origin/* branches so I still prefer git up.

mloskot
  • 37,086
  • 11
  • 109
  • 136
Richard Hansen
  • 51,690
  • 20
  • 90
  • 97
  • 3
    This is a great answer for an oft-overlooked topic. I've seen the vanialla `git pull` cause many problems in practice for Git newbies. Questions: Why `remote update` over `fetch`? What does the leading ! do in the alias command? Couldn't you also alias: `git fetch --all -p; git merge --ff-only @{u}` – brianz Mar 11 '13 at 17:33
  • 7
    @brianz: `git remote update -p` is equivalent to `git fetch --all -p`. I'm in the habit of typing `git remote update -p` because once upon a time `fetch` didn't have the `-p` option. Regarding the leading `!`, see the description of `alias.*` in `git help config`. It says, "If the alias expansion is prefixed with an exclamation point, it will be treated as a shell command." – Richard Hansen Mar 11 '13 at 19:55
  • 1
    So why is this default harmful behaviour tolerated and is there an outstanding bug report about fixing it? – pjc50 Mar 07 '14 at 18:49
  • 1
    @pjc50: It's tolerated because the Git devs don't want to break backward compatibility with existing scripts. Also, there isn't a mature substitute, so they can't deprecate the current behavior just yet. The Git devs don't use a bug tracker; they use the mailing list instead. See discussion [here](http://thread.gmane.org/gmane.comp.version-control.git/233554) and [here](http://thread.gmane.org/gmane.comp.version-control.git/235948/focus=235949). Also note that [these changes](http://thread.gmane.org/gmane.comp.version-control.git/240488) will be in the next version of Git. – Richard Hansen Mar 08 '14 at 01:19
  • 1
    I wrote a slightly more elaborate command than `git up` to do this, [`git get`](https://gist.github.com/Boldewyn/8454951). Maybe it's useful. – Boldewyn Mar 12 '14 at 13:27
  • 14
    Git 2.0 adds a `pull.ff` configuration that appears to achieve the same thing, without aliases. – Danny Thomas Mar 12 '14 at 13:28
  • 52
    Some of the reasons sound like "pull can cause problems when others do crazy stuff". No, it's crazy stuff like rebasing a commit out of an upstream repo that causes problems. IMO rebase is only safe when you do it locally on a commit that hasn't been pushed yet. Like, for example, when you pull before you push, rebasing local commits helps keep your history linear (though linear history isn't that big of a deal). Still, `git up` sounds like an interesting alternative. – mcv Mar 12 '14 at 13:42
  • If you only want to update the branch you're on, isn't the `git up` alias similar to just doing `git pull --ff-only`? – Matt Mar 12 '14 at 13:52
  • In general I find "git fetch" followed by "git rebase" to be the cleanest, safest way. The fetch shows you all the branches with changes, and the rebase preserves the commit history. – MattC Mar 12 '14 at 14:29
  • also see http://stackoverflow.com/questions/5519007/how-do-i-make-git-merges-default-be-no-ff-no-commit - you can make pull only ever do fastforward/noncommit. – Justin Mar 12 '14 at 14:31
  • Another option is `git pull --rebase`. – Markus Unterwaditzer Mar 12 '14 at 15:40
  • 17
    Most of your points are because you are doing something wrong: you are trying to review code **in your own working branch**. That's not a good idea, just create a new branch, pull --rebase=preserve and then toss that branch (or merge it if you want). – funkaster Mar 12 '14 at 17:08
  • 6
    @funkaster's point here makes a lot of sense, especially re: "Difficulty Reviewing Other People's Commits". This is not the review flow most Git users use, it's something I've never seen recommended anywhere and it is the cause all of the unnecessary extra work described below the heading, not `git pull`. – Ben Regenspan Mar 12 '14 at 17:30
  • 2
    Could someone explain what does @{u} mean? – Vladislavs Burakovs Mar 13 '14 at 14:32
  • Rewriting a repository's history after that history has been shared with other developers sounds like a bad idea. If the remote's commit history is in a state where `git pull` doesn't work then you're doing it wrong. – awhie29urh2 Mar 14 '14 at 15:54
  • Good summary, but it it sounds like it's written from the perspective of someone with a very particular (and odd) git workflow in mind. Most of my git experience has been in cases where history linearity is irrelevant and rebasing is rarely - if ever - done. In such situations this answer is mostly irrelevant. – Max Mar 14 '14 at 16:24
  • Make sure you don't just blindly follow this advice, it can cause as many problems as it tries to solve. The original poster of the question also posted this answer at the same time, so I have to question the validity of what they say or what they are trying to achieve. – practicalli-john Mar 16 '14 at 19:14
  • personally I do `git rebase origin/master` after a `git fetch`, if it can fast forward it will, and if not a rebase will be done. You just have to know when to use `git pull/git merge/git rebase`. – Azr Jun 02 '14 at 10:51
  • @countfloortiles Nowhere does this answer tell you to use `git push --force`. Rebase does not always rewrite the public history. – alexia Mar 27 '15 at 19:25
  • 1
    FYI, @VladislavsBurakovs, I found that @{u} is short for @{upstream}, and it refers to the current upstream remote for your branch. – m0j0 Jun 16 '16 at 00:10
  • Auto-pruning branches all that no longer exist upstream may be dangerous. If someone accidentally removed a valuable branch, your `origin/feature` is a backup (probably the only backup), and you don't want to lose it. – Nick Volynkin Dec 26 '16 at 07:26
200

My answer, pulled from the discussion that arose on HackerNews:

I feel tempted to just answer the question using the Betteridge Law of Headlines: Why is git pull considered harmful? It isn't.

  • Nonlinearities aren't intrinsically bad. If they represent the actual history they are ok.
  • Accidental reintroduction of commits rebased upstream is the result of wrongly rewriting history upstream. You can't rewrite history when history is replicated along several repos.
  • Modifying the working directory is an expected result; of debatable usefulness, namely in the face of the behaviour of hg/monotone/darcs/other_dvcs_predating_git, but again not intrinsically bad.
  • Pausing to review others' work is needed for a merge, and is again an expected behaviour on git pull. If you do not want to merge, you should use git fetch. Again, this is an idiosyncrasy of git in comparison with previous popular dvcs, but it is expected behaviour and not intrinsically bad.
  • Making it hard to rebase against a remote branch is good. Don't rewrite history unless you absolutely need to. I can't for the life of me understand this pursuit of a (fake) linear history
  • Not cleaning up branches is good. Each repo knows what it wants to hold. Git has no notion of master-slave relationships.
Ward Muylaert
  • 545
  • 4
  • 27
Sérgio Carvalho
  • 1,135
  • 1
  • 8
  • 8
  • 14
    I agree. There's nothing inherently harmful about `git pull`. However, it might conflict with some harmful practices, like wanting to rewrite history more than is strictly necessary. But git is flexible, so if you want to use it in a different way, by all means do so. But that's because *you* (well, @Richard Hansen) want to do something unusual in git, and not because `git pull` is harmful. – mcv Mar 12 '14 at 15:20
  • 30
    Couldn't agree more. People are advocating for `git rebase` and considering `git pull` harmful? Really? – Victor Moroz Mar 12 '14 at 15:22
  • 10
    It would nice to see someone create an graph, with morality as your axis, and classify git commands as good, bad, or somewhere in-between. This chart would differ between developers, though it would say a lot about one uses git. – michaelt Mar 12 '14 at 16:21
  • 5
    My issue with `git pull` without the `--rebase` option is the direction of merge it creates. When you look at the diff, all of the changes in that merge now belong to the person who pulled, rather than the person who made the changes. I like a workflow where merging is reserved for two separate branches (A -> B) so the merge commit is clear what was introduced, and rebasing is reserved to getting up-to-date on the same branch (remote A -> local A) – Craig Kochis Mar 12 '14 at 19:16
  • 1
    So, perhaps the real question is - how does one protect ones self from bad practices? It would be awesome to have the default "get my repo in sync with the other one" to list conflicts (history and source) and provide straight-forward ways to resolve the same. – ash Mar 12 '14 at 19:26
  • 4
    So what does it gain you to know if someone made a pull just a few seconds before someone else or the other way around? I think this is just noise and is just obfuscating the really relevant history. This even lessens the value of the history. A good history should be a) clean and b) actually have the important history. – David Ongaro May 23 '14 at 18:14
26

It's not considered harmful if you are using Git correctly. I see how it affects you negatively given your use case, but you can avoid problems simply by not modifying shared history.

awhie29urh2
  • 15,547
  • 2
  • 19
  • 20
Hunt Burdick
  • 517
  • 4
  • 13
  • 12
    To elaborate on this: If everyone works on their own branch (which in my opinion is the proper way to use git), `git pull` isn't any kind of issue. Branching in git is cheap. – AlexQueue Mar 12 '14 at 18:44
18

The accepted answer claims

The rebase-pull operation can't be configured to preserve merges

but as of Git 1.8.5, which postdates that answer, you can do

git pull --rebase=preserve

or

git config --global pull.rebase preserve

or

git config branch.<name>.rebase preserve

The docs say

When preserve, also pass --preserve-merges along to 'git rebase' so that locally committed merge commits will not be flattened by running 'git pull'.

This previous discussion has more detailed information and diagrams: git pull --rebase --preserve-merges. It also explains why git pull --rebase=preserve is not the same as git pull --rebase --preserve-merges, which doesn't do the right thing.

This other previous discussion explains what the preserve-merges variant of rebase actually does, and how it is a lot more complex than a regular rebase: What exactly does git's "rebase --preserve-merges" do (and why?)

Community
  • 1
  • 1
Marc Liyanage
  • 4,601
  • 2
  • 28
  • 28
-1

If you go to the old git repository git up the alias they suggest is different. https://github.com/aanand/git-up

git config --global alias.up 'pull --rebase --autostash'

This works perfect for me.

Nathan Redblur
  • 632
  • 6
  • 12