1

It happens that my git-partner, with whom I share a private repo, had to pull a lot because he didn't pull for a while. I want to keep working during his ~3h download session but I also want to avoid trouble.

Therefore I'd like to know:

  1. Is it safe to push while he pulls?
  2. is it safe to amend to my last commit while he pulls?
MERose
  • 4,048
  • 7
  • 53
  • 79

3 Answers3

4

For what it's worth, here's a more detailed, technical-ish (but still high level-ish) description of push vs fetch / pull, incorporating the case of a very slow (multi-hour) fetch.

Before you read the rest of this answer, you may want to run git ls-remote origin. This is a read-only operation so it's safe to do at any time. On a busy server, the output here can be quite large, so you might want to view it in an editor, but you'll see a long list of SHA-1 and name pairs. Each SHA-1 identifies a specific git object in their (the remote's) repository, and each name is a name for that object.

The nearest opposite of push is fetch, not pull

The git pull command is simply1 a convenience command that runs git fetch followed by git merge (or, if directed, git rebase). So you really should compare push to fetch.

Remember that fetch and push involve two gits

In general,2 you will fetch or push from/to a "remote", on another host entirely. The other host runs another instance of git, which looks in another repository. Generally the two repositories are related somehow, e.g., both were originally cloned from a third location or one was cloned from the other. The details of the relationship matter less than that there is some relationship, so that the two repositories have something in common. (This is not actually required, it's just the typical case.)

The call goes over the Internet-phone, or some similar communications channel. Thus, we can speak of "your git" saying something to "their git", and "their git" saying something back to "your git". The two git instances converse with each other in order to figure out what to do.

Refspecs are key

Both fetch and push use a key concept called a "refspec". At its core, a refspec is essentially a pair of reference names, such as a pair of branch names or a pair of tag names. The pair is separated by a colon, e.g., master:master. A "fully qualified" name, like refs/heads/master or refs/tags/v1.2, avoids ambiguity here, but short names usually work well (and are what most people use when explicitly specifying something) as git will automatically figure out whether the name is a branch or a tag.3

You can use more than one refspec at a time, and refspecs allow for multiple matches with a * syntax as shown here.

The default refspec for a fetch from a remote named origin is:

+refs/heads/*:refs/remotes/origin/*

We'll ignore the plus sign here.4 This particular refspec says: "take all branches" (in the refs/heads/ name-space) "and copy them from their repo to my repo, but in my repo, change the name to refs/remotes/origin/".

This is how "remote branches" AKA "remote tracking branches" work, and how they arise in the first place: your git copies their git's branches, but places them in a different name-space, so that (your copies of) their branches will not affect your branches.

(When you use git pull some of this is hidden from you. This "hiding" is rather imperfect though, so you should know about it. In modern gits—1.8.4 and later—remote branches are updated on all fetch operations, including those run from git pull. Earlier versions of git don't update the remote branches on these kinds of fetch operations, but do still update them "regular" fetches and on all push operations, which is a bit weird, and is why newer gits just always update.)

Refspecs for push are complicated

The usual default refspec for fetch is easy to describe—"get me all their branches, but make them remote-tracking branches in my repo". This is clean, simple, and effective, and it has been the default ever since "remotes" were invented.

The default for push is complicated and configurable, and the default has changed in git 2.0.

The old default push is called matching and the new default is called simple (although it's still a bit complicated). The new simple rule is:

  • find the "upstream" branch, and
  • push to the same branch on the remote, but only if it has the same name.

That is, suppose you're on branch master, and the "upstream" for that is "branch master on remote origin". Then git push means git push refs/heads/master:refs/heads/master.

For the old default (matching), your git and their git talk a bit to see which branches you have, and which branches they have. Then all the branches that you both have, that have the same name, are put into your refspec. So if you have a branch named betty and they have a branch named betty, that adds refs/heads/betty:refs/heads/betty to your push refspecs. Then if you both have branches named fred, those are added, and so on. Your git then attempts to push all the matching branches to their git.

It's worth noting two more things here:

  • On fetch, "their" ref-names go on the left of the refspec, and yours go on the right. Their master becomes your origin/master because the refspec has refs/heads/master on the left, and refs/remotes/origin/master on the right. But on push these are reversed: your branch or tag name goes on the left, and theirs goes on the right.

  • When pushing, you can omit "your" side of the refspec to tell the remote to delete a reference. To ask their git to delete branch develop, you can use git push :refs/heads/develop.

Crucial: fetch and push are not symmetric

Besides the somewhat obvious syntactic differences (fetch doesn't have a way to delete, and the left vs right side bits), there's one more thing that's absolutely crucial here. When you use git fetch, you copy their branches to your "remote branches", but when you push, you ask to send your branches to their branches, not to any sort of "remote branch".

What this means is that if you don't already have all the commits they have on your local branch, and you ask them to take that branch via git push, you will ask them to lose some commit(s)—specifically, whichever commits they have that you don't.

Normally a remote will refuse ("reject") a push that would lose commits ("non-fast-forward"). You can override this with --force but that's usually the wrong thing to do.

Unlike push, git fetch puts "their" branches in your remote-tracking branches. Thus, fetch just takes what they have every time. This can't disrupt your branches since it does not even touch your branches.5

This is why you merge or rebase

Suppose you've done some work, and someone else has also done some work. Suppose also that both of you use git push to update a third repository (perhaps on github, for instance), and you're both working on branch develop. Consider it as a sort of race: you've both made some changes and committed them locally, and now you're in a race to see who can push first.

Let's say the other guy wins the race. He does his git push, which calls up his remote—this is also your remote—and he asks them (the remote) to take his develop branch and make it their develop branch. Since what he's done is simply adding to what they had before, they accept his push.

Now you come along, having lost the race, and ask your remote to take your develop branch and make it their develop branch. They check as usual, but this time, they find a commit, or several commits, that they have, that you don't. These are precisely the commits that the other guy pushed, when he won the race.

The remote will reject your push as a non-fast-forward.

You can now use git fetch to obtain those commits from the remote. They will go into your origin/develop remote-tracking branch.

You now have all of their commits, plus your own commits. If we draw a part of your commit graph, it may look something like this:

              o   <-- HEAD=develop
            /
... - o - o
            \
              *   <-- origin/develop

Here, the commit marked * is the one the other guy pushed when he won the race. It is now your job to coordinate your changes and his changes.

There are plenty of articles (here on stackoverflow, and elsewhere) about how to deal with this (merge or rebase, when to use which one, and so on).

Whether you prefer merging or rebasing, you can do it manually:

git fetch
git rebase

or you can use the git pull script, perhaps with --rebase or a configuration item, to combine the two steps. (I think it's better to start by doing them separately, as you'll get a better mental model of the work-flow, and eventually you'll know intuitively when it's safe to use git pull as a convenient method of doing both operations all at once. Also, when you do them separately, you can easily look and see what's happened since you last re-synchronized, as git log ..@{u}6 will show you what's in the upstream that is not in your branch.)

What about a very slow fetch? For that matter, what about a very slow push?

The key item here is that when doing either fetch or push, git first figures out what repository objects to transfer, then it starts doing the transfers. Then, for a push, git atomically does its checking, and permits or rejects the label-update part of the operation.

Let's consider fetch first, and assume that git is able to use a "thin pack". Your git calls up their git, the two gits figure out that you need many megabytes or even gigabytes of commits, trees, blobs, and/or tags—these are the four object types in the repository—and their git packages them all up as a "thin pack".

At this point, your git begins the slow process of transferring all this data over the Internet-phone. If someone else comes along and does a (successful) push, that push—along with its objects—goes into the remote repository, but your git and their git have already decided what's coming over, and those new objects and/or label-settings are not included.

When your fetch finishes, your git expands the thin pack (this is where you see the "resolving deltas" message) and updates your labels, based on what you brought over. It's as if the push has not yet happened: you get an atomic snapshot of what they had when you started your fetch.

(This is why you might want to run a second fetch immediately after a long, multi-hour fetch on a busy repository: you can pick up any changes that occurred during that period. With any luck, this time you will just have a few small items to bring over, which will take only a few thousand milliseconds, probably not enough time for even more changes to sneak in.)

If you have a very slow push, the situation is similar: you send the remote a "thin pack" that's not very thin, and once he gets the whole thing (and resolves deltas), then he checks to see if it's a fast-forward, or is otherwise permitted. This checking and (if permitted) updating is all done atomically (the remote uses a lock file to achieve this atomicity—and in fact, a fetch you run uses a lock file on your side, for the same reason). For each label (branch or tag) update, the push either succeeds or fails.7

(If you're using a "dumb" protocol, the details change, but the overall strategy remains the same. Updates are, or should generally appear, atomic.)


1Or not that simply, since it has a lot of special corner cases it tries to handle, plus the logic to do rebase instead of merge.

2You can do git operations on your local machine that don't involve a second git instance. However, the principle is the same; it's just that now your local git plays both the "local" and "remote" roles, speaking to itself.

3Put very simply, it's a branch name if it's in refs/heads/, and it's a tag if it's in refs/tags/. Git gets a chance to see which one it's in (on both sides) during the "phone call" between the two git instances. If the name could be ambiguous—if there are both branches and tags named bob, for instance—you can spell it out explicitly which one(s) you want.

4The plus sign simply sets the "force" flag for that particular refspec. This is the same force flag you can set with --force, except that it's per-refspec rather than global.

5This glosses over the fact that git's tags use a single global name-space. That is, when you git fetch from a remote, even without adding --tags, git may update your local refs/tags/ entries. In particular, unless you specify --no-tags, your git will see if any of the new object SHA-1 IDs you bring over correspond to any tags on the remote (see the output of git ls-remote: all the tag SHA-1 IDs are available at all times). If so your git will create a corresponding tag. Since there is no "remote tags" name-space (unless you reinvent it yet again), it's not completely safe to git fetch, as this may add a "surprise" tag (one you did not expect). In practice, however, since tags never8 move, this is not a problem.

6The @{u} syntax means @{upstream}, which means "find the upstream branch I'm tracking", which in this example would be origin/develop. Once you've done the git fetch, origin/develop points to the latest commit present on the remote—since you just picked it up by fetching—and the .. syntax means "find commits reachable from the right-hand-side specifier that are not reachable from the left-hand-side specifier". The empty left-hand-side means HEAD which means the tip of your current branch develop, so this asks git to log commits that are on origin/develop that are not on develop.

7For regular pushes, this is pretty straightforward: you're expecting to push a fast-forward, where the remote will, for instance, have branch refs/heads/B pointing to commit 1234567.... You have commit fedcba9... whose ancestor is 1234567... and you ask to push this commit to their refs/heads/B. Once they have the pack, they check to see if their current refs/heads/B is an ancestor of what you're asking to set it to. Either it is—you've asked for a fast-forward operation on the label—or it isn't and the push is rejected.

For force-pushes, though, or when deleting a branch, you might want to make sure that the remote's refs/heads/B points to some specific commit, i.e., that no one else has won a "push race" against your force-push or delete operation. This was, at one point, not possible in git, but since 1.8.5, git has acquired the --force-with-lease option for push. Here, you specify the SHA-1 you believe the remote will have its label pointing-to, by the time your push has gotten all the way across and is being executed atomically. If you are correct, the update is allowed. If it turns out that the label has some other value, your force-update is rejected instead.

This is not something most people normally need, but it does allow for atomic updates that are not fast-forwards.

8What, never? Well, hardly ever!

torek
  • 448,244
  • 59
  • 642
  • 775
  • Don't you think a 2507 words essay is a bit long to answer this question? Less would be more. – cmaster - reinstate monica Dec 27 '14 at 20:58
  • @cmaster I think this answer was very good and informative. What's the harm? – Adrian Schmidt Feb 25 '15 at 12:29
  • @AdrianSchmidt Any text has both a benefit and a cost for the reader: the benefit is the information that it provides, the cost is the time readers need to spend reading it. While this text is certainly high on the benefit side, it is also extremely high on the cost side. Most readers won't be willing to pay so much for the information it provides. A more concise presentation of the essential points would significantly increase the benefit/cost ratio for readers. That's what I meant by "less would be more". – cmaster - reinstate monica Feb 25 '15 at 17:16
  • @cmaster Why not leave that decision to the reader? There are other answers to the question, and you could provide a more concise answer yourself, instead of beating down on the effort of others. – Adrian Schmidt Feb 26 '15 at 14:25
  • @AdrianSchmidt If I had tried to beat down torek's effort, my comment would have sounded a whole lot different. I merely pointed out how he could have improved the impact of his answer. That's what I would call constructive criticism. As to the readers, I have no doubt they will make their own decisions, but I, for one, don't always realize how readers will react to my postings. So I'm usually thankful for any feedback, positive and negative. – cmaster - reinstate monica Feb 26 '15 at 16:37
  • @cmaster Then I misunderstood your intent. It's very easy to read a passive-aggressive tone into the question in your first comment. Sorry for the misunderstanding. – Adrian Schmidt Mar 03 '15 at 15:48
2

TL;DR - yes to both questions, with the caveat that it's generally bad practice to amend commits which have already been published.

  1. Is it safe to push while he pulls?

To start, 'pull' is two operations - 'fetch' and then 'merge'. First, git will 'fetch' all commits in the remote repository (ancestors to the specified HEAD), and store them in your local repository. Second, it will 'merge' the specified remote HEAD into the local HEAD (this may be a fast forward, or a new commit with multiple parents).

The 'merge' part of this is a local operation, and not really relevant. Your partner will 'merge' based on whatever was 'fetch'ed from the repository.

So the questions boils down to - is it safe to 'fetch' while another person is doing a 'push'? And the answer is - yes, of course! At the end of the 'fetch', your partner will either have all of your commits or not. If they don't, the leftover commits will be discovered next 'push' or 'pull', per the usual git workflow. It's logically equivalent to them completing the 'pull', and then you 'push'ing new commits.

  1. is it safe to amend to my last commit while he pulls?

Eh, as a matter of process, you probably don't want to amend commits which you have already published to a remote repository (google it). But if you really want to do that, yes, you can. And your partner will, again, either get those changes (the ones which you published during their 'fetch'), or they won't. If not, it doesn't matter, as those changes will be discovered later.

Jeff
  • 1,722
  • 13
  • 7
  • Why is it bad practice? I think there's enough reasons to kind of update the last commit: https://www.reviewboard.org/docs/codebase/dev/git/clean-commits/ I definitely think, the `--amend` option was designed for something. – MERose Dec 27 '14 at 23:41
  • It's bad practice to `--amend` commits which have been _published_ (pushed to a remote). Check out "When To Push", near the end of the link you shared. Otherwise it's great! But since `--amend` creates a new commit, and the old commit is removed from history, anyone working off your old commit will have a different history. http://stackoverflow.com/questions/253055/how-do-i-push-amended-commit-to-the-remote-git-repo discusses this, too. – Jeff Dec 28 '14 at 03:49
1

You should remember that a branch is only a reference to a commit in git. That is, updating a branch is actually a very quick and easy operation, which is also very easy to implement atomically. Yes, you also need to transmit a number of blobs in a push/fetch operation, but these are inconsequential to the state of the repository: It does not matter if you have some blobs in a repository that are not (indirectly) connected to branches, it's what blobs are reachable via the branches that matters.

That means, you can expect all push and fetch/pull operations to be atomical, which gives the push/fetch operations that affect a given repository a total order. Either the push happens before the fetch, or the push happens after the fetch, there is no third option.

Concerning your question about amending a commit: That is a purely local operation. You create a new commit with the same ancestors as the one that you are amending, and you point your branch to that new commit. That's it, there is no other repository involved.

However, you should never amend a commit that you have pushed already: That is rewriting history, and the consequences of rewriting history are severe. Never attempt to do this without understanding precisely what you are doing first.

cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106
  • "Never amend to a commit"? Wow, that's strong words. But I do think, the `--amend` option was made for that. – MERose Dec 27 '14 at 23:39
  • @MERose You didn't quote the relevant part: "... that you have pushed already." You can modify commits only seen by you all you want, but once you have pushed them to a remote for others to see, you should not change them. – chepner Dec 28 '14 at 03:24