1

There is a git repo that I wanted to clone, but it has many large files in it that I am not interested in. I looked at "sparse checkout" to get only one folder, but that still downloads many gigabytes so it's pointless.

I tried cloning using the --filter option - my first attempt again downloaded the whole thing but adding --no-checkout downloaded much less:

git clone --depth=1 --no-checkout --filter=blob:limit=2m 

But there were no files in the repo (besides the .git folder which was now a more manageable 1GB).

So how do I get the files, without the large ones? Doing git checkout master gets the full repo again. Adding the filter to that command gives me an "unknown option" error. Same for git pull with the filter.

DisgruntledGoat
  • 70,219
  • 68
  • 205
  • 290

3 Answers3

0

But there were no files in the repo

Yes, there are. There are no files visible in the working tree, but that doesn't matter. It is not the repo.

(besides the .git folder which was now a more manageable 1GB).

Yup. And that's the repo. And that's where the files are.

If you want to see the files, you need to check them out of the repo.

Doing git checkout master gets the full repo again.

Not the full repo. It gets just the filtered master branch. So it will not be "many gigabytes". But yes, it will take up more space because the files in the repo are being copied into the work tree.

If you still want to make the repo smaller you could now run filter-branch or filter-repo.

matt
  • 515,959
  • 87
  • 875
  • 1,141
  • It's possible that the filtered-away objects are *in* the tip commit of the master branch. If this is the case, `git checkout master` will fulfill the promises by obtaining the large objects. This is where adding sparse checkout can be helpful: you can avoid checking out specific large files. – torek Dec 01 '20 at 17:15
  • "It gets just the filtered master branch" - no that's not correct. It gave me every file in the repo including all the ones over 2MB. – DisgruntledGoat Dec 01 '20 at 17:47
0

To add a bit of background to matt's answer, I'll note that this:

git clone --depth=1 --no-checkout --filter=blob:limit=2m 

uses a new feature that's still in development to produce what Git calls a partial clone with a promisor remote and promisor pack. There are many details here. The feature is still experimental, so some details might change somewhat in the future.

If you are using this feature, read and understand the entire technical documentation I linked, which describes existing limitations of partial clones and some of the plans to fix them in the future.

torek
  • 448,244
  • 59
  • 642
  • 775
  • Thanks, I've read through that page. I got the gist but I'm not sure I fully understand it... and it doesn't seem to answer my question at all. git checkout still downloads and checks out every single file over 2MB. – DisgruntledGoat Dec 02 '20 at 13:45
  • If you've gotten a promisor pack, `git checkout` is *meant* to only download those objects that will actually be accessed, so if you set up your sparse checkout ahead of time and then execute the sparse checkout, it's only *supposed* to download those objects that are required. This feature is not yet ready for general use though, and it's possible that this "only download what will be checked out" code has bugs in it. You're working right on the bleeding-edge here, so you might consider cloning the Git source and debugging it. :-) – torek Dec 02 '20 at 19:27
0

To complement git clone --filter (that I detail here), you also have, with Git 2.36 (Q2 2022), "git fetch --refetch"(man) learned to fetch everything without telling the other side what we already have, which is useful when you cannot trust what you have in the local object store.

See commit 4963d3e, commit 7390f05, commit 011b775, commit 3c7bab0, commit 869a0eb, commit 4dfd092, commit 1836836 (28 Mar 2022) by Robert Coup (rcoup).
(Merged by Junio C Hamano -- gitster -- in commit 0f5e885, 04 Apr 2022)

fetch: add --refetch option

Signed-off-by: Robert Coup

Teach fetch and transports the --refetch option to force a full fetch without negotiating common commits with the remote.

Use when applying a new partial clone filter to refetch all matching objects.

fetch-options now includes in its man page:

ifndef::git-pull[]

--refetch

Instead of negotiating with the server to avoid transferring commits and associated objects that are already present locally, this option fetches all objects as a fresh clone would.

Use this to reapply a partial clone filter from configuration or using --filter= when the filter definition has changed.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250