2

I want a bare, shallow clone of a git repo with no file contents, as all I'm interested in are the file paths themselves. This works great:

$ git clone --bare --depth=1 --filter=blob:none --branch="118.0.5977.1" "https://github.com/chromium/chromium.git"
Cloning into bare repository 'chromium.git'...
remote: Enumerating objects: 34624, done.
remote: Counting objects: 100% (34624/34624), done.
remote: Compressing objects: 100% (25673/25673), done.
remote: Total 34624 (delta 1647), reused 21869 (delta 1304), pack-reused 0
Receiving objects: 100% (34624/34624), 13.72 MiB | 16.20 MiB/s, done.
Resolving deltas: 100% (1647/1647), done.

It completes in about 3 seconds, and takes up only 15 MiB on disk. I can get the paths with git ls-tree -r HEAD.

However, various git commands seem to want to fetch additional data from the remote repo. For example

$ cd chromium.git
$ git log
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (1/1), 372 bytes | 372.00 KiB/s, done.
commit 58a2c380702a84b362d0ee74ffc1e53e937770dd (grafted, HEAD, tag: 118.0.5977.1)
...

Can I tell git not to do this? I would prefer the command to fail rather than fetch any additional data from the remote.

Tavian Barnes
  • 12,477
  • 4
  • 45
  • 118
  • 1
    I do not have a lot of experience with bare/shallow repos _but_ it sounds like a contradiction that you do a shallow clone and then you try to run a `git log`. Also... if it is a bare repo, I would think that providing the branch you want to log makes sense. – eftshift0 Aug 29 '23 at 19:10
  • Bare repos still have a HEAD, but anyway `git log 118.0.5977.1` does the same thing. Also `git log` was just an example, it seems many commands (`git cat-file`, `git rev-parse`, ...) end up talking to the remote. – Tavian Barnes Aug 29 '23 at 19:21
  • If I do `git remote remove origin` then `git log` says `error: unable to read mailmap object at HEAD:.mailmap`, which at least explains what it's looking for on the remote – Tavian Barnes Aug 29 '23 at 19:24
  • It makes sense.... you did _a shallow_ and pulled no blobs.... you are asking to run a full race but you only got gas _vapors_ in the tank. – eftshift0 Aug 29 '23 at 19:27
  • 1
    @eftshift0 Sure, but I would be very surprised if my car drove itself to the gas station... in this case, I'd rather it just stop once it runs out of gas – Tavian Barnes Aug 29 '23 at 20:18
  • 1
    Is this what you want? https://stackoverflow.com/questions/9224754/how-to-remove-origin-from-git-repository – Joshua Aug 29 '23 at 20:31
  • @Joshua Not exactly, I'd like to keep the remote around so that an explicit `git fetch` works if I need it. Anyway I found an explanation here: https://www.git-scm.com/docs/partial-clone/. `git config remote.origin.promisor false` does what I want. – Tavian Barnes Aug 29 '23 at 20:52
  • 1
    Feel free to answer your own question if you've solved it. – Joshua Aug 29 '23 at 21:00

2 Answers2

5

I found the relevant documentation at https://www.git-scm.com/docs/partial-clone/. In particular,

  • Since almost all Git code currently expects any referenced object to be present locally and because we do not want to force every command to do a dry-run first, a fallback mechanism is added to allow Git to attempt to dynamically fetch missing objects from promisor remotes.

    When the normal object lookup fails to find an object, Git invokes promisor_remote_get_direct() to try to get the object from a promisor remote and then retry the object lookup. This allows objects to be "faulted in" without complicated prediction algorithms.

    For efficiency reasons, no check as to whether the missing object is actually a promisor object is performed.

    Dynamic object fetching tends to be slow as objects are fetched one at a time.

...

Remotes that are considered "promisor" remotes are those specified by the following configuration variables:

  • extensions.partialClone = <name>
  • remote.<name>.promisor = true
  • remote.<name>.partialCloneFilter = ...

So if there is a "promisor" remote, git will automatically fetch missing objects from it. To make it not a promisor, all I have to do is

$ git config --unset remote.origin.promisor
$ git config --unset remote.origin.partialclonefilter

and it now gives errors like I want it to:

$ git log
error: unable to read mailmap object at HEAD:.mailmap
commit 58a2c380702a84b362d0ee74ffc1e53e937770dd (grafted, HEAD, tag: 118.0.5977.1)
...
Tavian Barnes
  • 12,477
  • 4
  • 45
  • 118
3

You've done a clone which is not only shallow (--depth=1), but partial (--filter=blob:none). According to the documentation, “[u]se of partial clone requires that the user be online and the origin remote or other promisor remotes be available for on-demand fetching of missing objects.”

If you don't want to have to be online at all times, then keep the shallow option, but re-clone without any --filter options, which will download just a single revision but avoid the need to be online.

bk2204
  • 64,793
  • 6
  • 84
  • 100
  • This is explicitly not what I want. I do want a partial clone. I don't ever want to "download objects on demand" -- I want git to fail if that would be necessary. – Tavian Barnes Aug 29 '23 at 21:46
  • A partial clone necessitates being online constantly to fetch objects on demand; that's built-in functionality. If you want a partial clone that doesn't automatically try to fetch objects on demand, you want a feature Git doesn't provide. – bk2204 Aug 29 '23 at 22:53
  • 1
    It seems like I can get git to work that way after all, see my own answer – Tavian Barnes Aug 29 '23 at 23:10