1

As explained at How do I clone a subdirectory only of a Git repository? the best way I've found so far to download all files in a Git subdirectory only is:

git clone --depth 1 --filter=blob:none --sparse \
  https://github.com/cirosantilli/test-git-partial-clone-big-small
cd test-git-partial-clone-big-small
git sparse-checkout set small

which is my best attempt so far at downloading only the small/ directory.

However, as soon as I run:

git clone --depth 1 --filter=blob:none --sparse \
  https://github.com/cirosantilli/test-git-partial-clone-big-small

any files (but not directories) present on the root directory are downloaded and appear in the repository, in the case of that test repo I get the unwanted file:

generate.sh

How to prevent that from happening, to obtain only the subdirectories that I'm interested in, without any root directory files?

I've checked on other repositories e.g. https://github.com/torvalds/linux , and having a large number of small files on toplevel does not slow down the download significantly (by downloading them one by one separately), so this would only be a problem if there are large files on toplevel.

Tested on Git 2.37.2, Ubuntu 22.10, February 2023.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
  • `git clone --sparse` [does exactly that](https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---sparse): *Employ a sparse-checkout, with only **files in the toplevel directory** initially being present.* (Emphasize mine — *phd*). That is, you cannot do what you want with `git clone`. You can setup sparse checkout and then use `git fetch` but fetch doesn't allow filters AFAIK. So you have to choose between one way or the other. – phd Feb 01 '23 at 15:14

1 Answers1

1

Do your clone --no-checkout aka -n, then set up your sparsity rules exactly as you want. To get really minimal clone traffic, don't use blob:none, use tree:0. Smoketest:

git clone -n --depth=1 --filter=tree:0 \
        https://github.com/cirosantilli/test-git-partial-clone-big-small
cd !$:t:r
git sparse-checkout set --no-cone '*/'
git checkout
jthill
  • 55,082
  • 5
  • 77
  • 137
  • OK, by doing `git sparse-checkout set --no-cone small` it achieves by use case of downloading only the `small` directory, thanks. I wonder if there's a way without `--no-cone` which `man git` says should be avoided. – Ciro Santilli OurBigBook.com Apr 08 '23 at 19:28
  • It's the definition of cone mode that you can't. Get a directory, get all its toplevel contents. That simplification makes working with really large checkouts *much* more efficient, but you have to give up some selectivity. I think the doc using "deprecate" overstates the case, there's downsides that might bite you. See if any of the listed downsides are painful in your use, if not, then they're not, and you can painlessly use `--no-cone`. – jthill Apr 08 '23 at 19:39
  • I've also added a big tree to the test now: https://github.com/cirosantilli/test-git-partial-clone-big-small/blob/286ccf7a5086d36b4ada8c86847f1c6d8a72335c/generate.sh#L52 and this approach appears to fetch them unfortunately. E.g. they show on `git ls-files`, and `ncdu` says `.git` is several megs. – Ciro Santilli OurBigBook.com Apr 21 '23 at 07:42
  • Seems you're right, to get that level of selectivity in the fetch you're going to have to [go full manual transmission on it](https://stackoverflow.com/questions/69258526/is-there-a-file-or-config-or-settings-for-ignoring-files-folders-in-the-remote-r/69263759#69263759) or switch back to a `blob:none` filter and read all the trees since github (like all others afaik) doesn't support combining filters. – jthill Apr 21 '23 at 16:12