How to clone a bare git repo with no commits and get the correct HEAD ref during the clone?

Question

This answer claims that the issue was fixed in version 1.8.4.3, but I still encounter it in version 2.25.1. It appears to work as expected in version 2.32.0, so I'm not sure when it was actually fixed.

Is there a way to get the expected behavior in git version 2.25.1 using the clone subcommand (without having to checkout/switch branches after cloning)?

Here are reproduction steps:

Initialize a bare repo:

BARE_DIR="$PWD/bare"
WORKING_DIR="$PWD/working"

mkdir -p $BARE_DIR/repo
cd $BARE_DIR/repo
git init --bare

Change the HEAD ref:

git symbolic-ref HEAD refs/heads/very-unlikely-to-be-your-configured-default-branch

Clone a working copy of the repo:

mkdir $WORKING_DIR
cd $WORKING_DIR
git clone $BARE_DIR/repo
cd repo

Check the HEAD ref:

cat .git/HEAD

I expect the output to be:

ref: refs/heads/very-unlikely-to-be-your-configured-default-branch

but instead it's:

ref: refs/heads/master

torek · Accepted Answer · 2021-07-10T01:43:40.757

1

When there are no commits, there are no branches.

Although Git can, in modern Git, read the default branch name from the other Git, and therefore could create a new clone with the correct ref: refs/heads/name entry, Git 2.25 does not. If it did, the unborn branch in the new clone would match the unborn branch in the bare repository. This is what you'd like. But until Git 2.31.0, Git did not do that.

If you can't upgrade, the solution is to make the repository you're cloning contain at least one commit, so that it can have an infinite number of branch names. Then create at least one branch name in that repository—though, actually, creating that first commit will create one branch name for you—and make its HEAD refer to the one desired branch name.

edited Jul 10 '21 at 01:43

answered Jul 09 '21 at 22:21

torek

448,244
59
642
775

Thanks for responding. You said "But Git doesn't do that" — however, Git does do this in `2.32.0`. I didn't state this in the question, because it's not really part of the question, but if I could figure out at which version it started doing it, I could inspect the source revision history to find the diff. – jsejcksn Jul 10 '21 at 01:15
@jsejcksn: Oh! That's nice that someone finally fixed it. [Here it is](https://github.com/git/git/commit/4f37d45706514a4b3d0259d26f719678a0cf3521), and the first release that has it is 2.31.0. – torek Jul 10 '21 at 01:43
Is there a way to create a commit (just an empty one will do) to a bare repository? – jsejcksn Jul 10 '21 at 20:52
1

Use `git write-tree`, `git commit-tree`, and `git update-ref` (in that order). Or use `git mktree` rather than `git write-tree` but that's even harder. In any case it's a minor pain in the butt; you'll probably want a little mini-script to do that. It's usually easier to start with a non-bare repository somewhere and `git push` to the initial bare clone, or `git clone --bare` to create the initial bare clone. I never start with a truly empty commit: I put in a README.md if I don't have anything else yet. – torek Jul 10 '21 at 23:53

VonC · Answer 2 · 2023-05-20T08:23:52.870

Is there a way to get the expected behavior in git version 2.25.1 using the clone sub-command (without having to checkout/switch branches after cloning)?

Even it there was a way, that would not always be enough:

"git clone"^(man) from a repository with some ref whose HEAD is unborn did not set the HEAD in the resulting repository correctly, which has been corrected with Git 2.38 (Q3 2022).

See commit daf7898 (11 Jul 2022), and commit cc8fcd1, commit 3d8314f, commit f77710c (07 Jul 2022) by Jeff King (peff).
^{(Merged by Junio C Hamano -- gitster -- in commit cf92cb2, 19 Jul 2022)}

clone: propagate empty remote HEAD even with other branches

^{Signed-off-by: Jeff King}

Unless "--branch" was given, clone generally tries to match the local HEAD to the remote one.
For most repositories, this is easy: the remote tells us which branch HEAD was pointing to, and we call our local checkout() function on that branch.

When cloning an empty repository, it's a little more tricky: we have special code that checks the transport's "unborn" extension, or falls back to our local idea of what the default branch should be.
In either case, we point the new HEAD to that, and set up the branch.* config.

But that leaves one case unhandled: when the remote repository isn't empty, but its HEAD is unborn.
The checkout() function is smart enough to realize we didn't fetch the remote HEAD and it bails with a warning.
But we'll have ignored any information the remote gave us via the unborn extension.
This leads to nonsense outcomes:

If the remote has its HEAD pointing to an unborn "foo" and contains another branch "bar", cloning will get branch "bar" but leave the local HEAD pointing at "master" (or whatever our local default is), which is useless.
The project does not use "master" as a branch.

Worse, if the other branch "bar" is instead called "master" (but again, the remote HEAD is not pointing to it), then we end up with a local unborn branch "master", which is not connected to the remote "master" (it shares no history, and there's no branch.* config).

Instead, we should try to use the remote's HEAD, even if its unborn, to be consistent with the other cases.

The reason this case was missed is that cmd_clone() handles empty and non-empty repositories on two different sides of a conditional:
if (we have any refs) {
    fetch refs;
    check for --branch;
    otherwise, try to point our head at remote head;
    otherwise, our head is NULL;
} else {
    check for --branch;
    otherwise, try to use "unborn" extension;
    otherwise, fall back to our default name name;
}
So the smallest change would be to repeat the "unborn" logic at the end of the first block.
But we can note some other overlaps and inconsistencies:

both sides have to handle --branch (though note that it's always an error for the empty repo case, since an empty repo by definition does not have a matching branch)

the fall back to the default name is much more explicit in the empty-repo case.
The non-empty case eventually ends up bailing from checkout() with a warning, which produces a similar result, but fails to set up the branch config we do in the empty case.

So let's pull the HEAD setup out of this conditional entirely.
This de-duplicates some of the code and the result is easy to follow, because helper functions like find_ref_by_name() do the right thing even in the empty-repo case (i.e., by returning NULL).

There are two subtleties:

for a remote with a detached HEAD, it will advertise an oid for HEAD (which we store in our "remote_head" variable), but we won't find a matching refname (so our "remote_head_points_at" is NULL).
In this case we make a local detached HEAD to match.
Right now this happens implicitly by reaching update_head() with a non-NULL remote_head (since we skip all of the unborn-fallback).
We'll now need to account for it explicitly before doing the fallback.

for an empty repo, we issue a warning to the user that they've cloned an empty repo.
The text of that warning doesn't make sense for a non-empty repo with an unborn HEAD, so we'll have to differentiate the two cases there.
We could just use different text, but instead let's allow the code to continue down to checkout(), which will issue an appropriate warning, like:

remote HEAD refers to nonexistent ref, unable to checkout

Continuing down to checkout() will make it easier to do more fixes on top (see below).

Note that this patch fixes the case where the other side reports an unborn head to us using the protocol extension.
It doesn't fix the case where the other side doesn't tell us, we locally guess "master", and the other side happens to have a "master" which its HEAD doesn't point.
But it doesn't make anything worse there, and it should actually make it easier to fix that problem on top.

Plus, you need to get the correct HEAD to the correct hash format (could be SHA256 instead of SHA1).

With Git 2.41 (Q2 2023), "git clone"^(man) from an empty repository learned to propagate the choice of the hash algorithm from the source repository to the newly created repository.

See commit 8b214c2 (05 Apr 2023) by Junio C Hamano (gitster).
^{(Merged by Junio C Hamano -- gitster -- in commit 96f4113, 11 Apr 2023)}

clone: propagate object-format when cloning from void

A user could prepare an empty repository and set it to use SHA256 as the object format.
The new repository created by "git clone"^(man) from such a repository however would not record that it is expecting objects in the same SHA256 format.
This works as expected if the source repository is not empty.

Just like we started copying the name of the primary branch from the remote repository even if it is unborn in 3d8314f ("clone: propagate empty remote HEAD even with other branches", 2022-07-07, Git v2.38.0-rc0 -- merge listed in batch #5), lift the code that records the object format out of the block executed only when cloning from an instantiated repository, so that it works also when cloning from an empty repository.

With Git 2.41 (Q2 2023), the server side of "git clone"^(man) now advertises the necessary hints to clients to help them to clone from an empty repository and learn object hash algorithm and the (unborn) branch pointed at by HEAD, even over the older v0/v1 protocol.

See commit 933e3a4 (17 May 2023) by brian m. carlson (bk2204).
^{(Merged by Junio C Hamano -- gitster -- in commit 633390b, 19 May 2023)}

upload-pack: advertise capabilities when cloning empty repos

^{Signed-off-by: brian m. carlson}

When cloning an empty repository, protocol versions 0 and 1 currently offer nothing but the header and flush packets for the /info/refs endpoint.
This means that no capabilities are provided, so the client side doesn't know what capabilities are present.

However, this does pose a problem when working with SHA-256 repositories, since we use the capabilities to know the remote side's object format (hash algorithm).
As of 8b214c2 ("clone: propagate object-format when cloning from void", 2023-04-05, Git v2.41.0-rc0 -- merge listed in batch #9), this has been fixed for protocol v2, since there we always read the hash algorithm from the remote.

Fortunately, the push version of the protocol already indicates a clue for how to solve this.
When the /info/refs endpoint is accessed for a push and the remote is empty, we include a dummy "capabilities^{}" ref pointing to the all-zeros object ID.
The protocol documentation already indicates this should always be sent, even for fetches and clones, so let's just do that, which means we'll properly announce the hash algorithm as part of the capabilities.
This just works with the existing code because we share the same ref code for fetches and clones, and libgit2, JGit, and dulwich do as well.

Thank you for sharing these commit updates here. Just curious: what led you to this question after so long? — jsejcksn, Aug 07 '22 at 12:39
@jsejcksn Google mostly. I have been updating Stack Overflow questions like yours with new Git features for a decade now. — VonC, Aug 07 '22 at 13:52

How to clone a bare git repo with no commits and get the correct HEAD ref during the clone?

2 Answers2

`clone`: propagate empty remote HEAD even with other branches

`clone`: propagate `object-format` when cloning from void

`upload-pack`: advertise capabilities when cloning empty repos

Linked

How to clone a bare git repo with no commits and get the correct HEAD ref during the clone?

2 Answers2

clone: propagate empty remote HEAD even with other branches

clone: propagate object-format when cloning from void

upload-pack: advertise capabilities when cloning empty repos

Linked

`clone`: propagate empty remote HEAD even with other branches

`clone`: propagate `object-format` when cloning from void

`upload-pack`: advertise capabilities when cloning empty repos