Does git lock a remote for writing when a user pushes?

Question

If two or more users simultaneously push their local repo states to the same remote at the same time, does git:

lock the remote commit/branch/repo for writing before it finishes off with one user's entire batch of commits before committing another user's?
Or does it release the lock on a commit/repo/branch it holds after it writes a single commit from a single user out of his batch of N commits?

The first would make sense but I thought I would ask anyway.

I don't have a reference for this, but it is my understanding that Git operations are basically guaranteed to be atomic, meaning that the entire operation completes/fails, or nothing happens. Assuming this, the first version above is what should happen. — Tim Biegeleisen, Oct 05 '18 at 09:06
@TimBiegeleisen: unfortunately, that's wrong in an important way. To get an atomic push you need to use `git push --atomic` (new in Git 2.4). However, it's probably right in terms of what the OP is intending. Am working on answer... — torek, Oct 05 '18 at 14:29

score 7 · Answer 1 · answered Oct 05 '18 at 15:02

TL;DR: mu

The question contains an incorrect assumption, so neither option is correct.

There are atomicity issues but they are not on a per-commit basis. They are on a per-reference basis.

If you push only one reference—e.g., git push origin master—there's only one reference to update. The update either succeeds or fails, and for the sender, that's pretty much it (although there are a lot of receiver-side details that still matter).

If you push more than one reference—e.g., git push origin develop master—there are multiple references to update. If your Git supports it (v2.4 or later on both sides), use git push --atomic to make sure that either both pushes succeed, or neither succeeds.

If you don't write pre-push, pre-receive, update, and/or post-receive hooks, you can stop here. If you do write them, read on.

Long

Locking happens in the receiver, not the sender (for what I hope are obvious reasons :-) ). The documentation never calls the internal details out explicitly, even though it should; but there are a number of separate locks and locking steps. In particular:

There is one lock per pack file.
There is one lock for shallow graft points, in case of a shallow repository.
There is one lock for the packed reference back-end data store (covering all packed references).
There is one lock for each reference name.¹
There is one lock for the index (not that this matters here in most cases).

Reading a reference does not require locking; only updating one requires locking it. This implies that a pure-reader may see the old value during a transition. Internally, however, it's possible to lock a series of references. See the atomicity notes below.

Taking a lock consists of creating the lock file using an atomic "create or fail if file already exists" operation. This must be provided by the underlying operating system. Unlocking is achieved by deleting or renaming the lock file: the lock file typically contains the new content for the file that the lock-file locks, so to drop the lock without changing the content, Git simply removes the lock file, and to drop the lock and change the file's content, as a single atomic operation, Git renames the lock file. The atomic rename operation must also be provided by the underlying OS.

Updating a packed reference converts it to unpacked ("loose"), obtaining the per-ref lock. Packing references obviously requires obtaining the packed-refs lock. Deleting a reference is a special case in two ways, though:

Unpacked references may appear in the packed-refs file as well. (The packed copy is ignored while the loose copy exists.) In this case, Git must also update the packed-refs file to delete both copies.
Deleting a reference deletes its reference-log, if the log exists. This is mostly invisible, but it does mean that the reference update code wants to know in advance that this is a delete operation.

¹Worth noting: some references are per-worktree. Originally this was just HEAD but as git worktree bugs have surfaced, it now includes all refs/bisect/ and refs/rewritten/ refs. The refs/rewritten/ references themselves are new, introduced with the new fancier interactive rebase that recreates arbitrary merges. Splitting bisect references was a fix in Git 2.7.0; see commit ce414b33ec038.

Also, some references are considered "pseudorefs". These are never packed. The pseudorefs are things like ORIG_HEAD, MERGE_HEAD, and so on. This is mainly an internal detail but it affects which locks might apply: a regular reference, refs/heads/master for instance, could either be packed, in which case the packed reference lock applies, or it could be unpacked, in which case the unpacked reference lock applies.

The push sequence

Since you're interested in atomicity during push, we have to look at how the process works.

The first step depends on transport protocol version, but in general, the sender collects a list of reference names and values from the receiver. No locks are held here. These reference names and values will show up in the sender's pre-push hook.

Next, the receiver has the sender gather objects and pack and send them (or send individual objects, but this is pretty rare today). No locks are held here either, and this may take a lot of time. During this process, the receiver's reference values may change. Implication: any checking you do on the sender, in a pre-push hook, cannot guarantee that the receiver's references are the same by the time the pack file arrives intact and the receiver begins processing it. But the pack file itself is locked once it's complete.

At this point, if necessary, the shallow graft file is locked (I think—this is not entirely obvious; it might happen later).

Next, the sender sends a series of update requests (with optional force flags). The receiver now has a chance to look up, and optionally lock, each reference-to-update. In fact, however, no locking occurs here either. The receiver runs the pre-receive hook with no locks in place. If the pre-receive hook declines the push, the entire push is aborted at this point, so nothing has changed. After the pre-receive hook vets the update as a whole, the pack file (or individual objects) is (are) moved from quarantine as well, if you have Git 2.11 or later (where quarantine was introduced).

Next, the receiver runs all the updates. This is where the atomicity becomes particularly interesting. Since Git version 2.4.0, git push has a new flag, --atomic. This relies on the receiver advertising atomic updates. There is a configuration value, receive.advertiseAtomic, you can set on the receiver to disable atomic updates. If:

the receiver advertises the atomic update capability (default true), and
the sender (whoever runs git push) understands the atomic update capability, and
the sender chooses --atomic

then the receiver will lock all the references-to-be-updated now, before updating any of them. If any of these locks fail, the entire push is aborted here. If they all succeed, the receiver will run each update hook, one at a time, to verify each update, before applying any updates. If any update hook fails, the entire push is aborted. If all update hooks accept each update, then the entire series of reference updates is committed atomically, by releasing each lock through a rename.²

On the other hand, if the sender did not choose --atomic,³ the receiver will update each reference one at a time. It runs the update hook, and if the update hook says to proceed, updates the one reference with a lock-update-unlock sequence. So each individual update can succeed or fail.

Implication: with or without --atomic, update hooks should not dilly-dally. Other operations are being held up at this point. Since the push may be made without --atomic—and even if it is you cannot know for certain which references will be updated—you cannot assume that any other references are stable here, either.

In any case, after updating all update-able references, Git drops all the locks. The reference locks are dropped by the act of updating them, as we noted at the top, but Git also drops the shallow and pack locks now, after updating shallow graft points if needed. Then, with no locks held, Git runs the post-receive hook. Implication: post-receive hooks cannot assume that the current value of any reference matches the values in its standard input. To see what was updated, you must read stdin; to see the current value, you must re-read the reference; these two may not be in sync.

²While individual renames are atomic, it's possible that some rename(s) will fail when other earlier renames succeeded. It's not completely clear what happens in this case.

³If the receiver configuration says not to advertise atomics, and the sender uses --atomic, the sender himself cancels his transaction. That is, if you run git push --atomic and the receiver has not advertised atomic support—either because the receiver is too old to have it, or because the receiver is configured that way—your Git stops at this point. In effect, you can't choose atomic push in this case.

Conclusion

From the sender's side, it looks fairly simple: if you don't make assumptions in a pre-push hook (or have no pre-push hook in the first place), you can either use git push --atomic to make all your reference-updates atomic—the whole push will either succeed or fail—or not, in which case each reference-update will either succeed or fail on its own. Each reference update consists of one of these:

Please set ref to hash (regular / not---force push)
Set ref to hash! (git push --force or git push ... +master:master)
If ref = old-hash, set it to hash! (git push --force-with-lease)

and each may be rejected individually, but --atomic means that if any one is rejected, none will happen.

From the receiver side, where you can write three kinds of hooks, it's complicated.

Does git lock a remote for writing when a user pushes?

1 Answers1

TL;DR: mu

Long

The push sequence

Conclusion

Linked