1

I am used to sparse checkout, it is efficient. But if I have a large folder, which I want to delete, it still checks out the whole content to local which I will immediately delete.

Is there a way to tell the remote to delete a certain path in remote?

Thanks

ipavlu
  • 1,617
  • 14
  • 24

2 Answers2

3

You can't do what you've asked for, but you can do what you need.

The trick here is to understand that:

  • Every commit contains every file: this includes sparse commits, where, even though you've only extracted files f0001 and f1074 from the 9999 numbered f files, your new commits you make still contain f0001 through f9999 inclusive.

  • Git does not build a new commit from what's in your working tree. Instead, Git builds the new commit from what's in Git's index.

When you use sparse checkout, Git still fills in the entire index (this is relatively inexpensive1). It then merely skips the extraction of most of the index copies2 of each file from the current commit. (Git also internally sets the --skip-worktree flag bit on each index entry corresponding to an omitted-by-sparseness step.)

Since you, as programmer, don't actually look at what's in the index,3 and instead just look in your working tree, it seems as if Git didn't copy the excluded files. But Git did copy (or "copy" as noted in footnote 2) those files, and your next commit contains those files.

When you make a new commit and run git push, your Git delivers the entire commit to the other Git repository. That includes all the files—but they're almost all de-duplicated and therefore take no space (and no time to include with the commit).

So:

I have a large folder ... which I want to delete

Simply remove all the files within that folder from Git's index (using git rm -r --cached path/to/folder). Remember, the index contains only the files—there's no actual folder involved here—but Git does understand the idea of folder-ized files as demanded by your OS, and knows how to handle this recursive git rm --cached. You can now make a new commit and the new commit will omit the files, as they're now not in Git's index.

Since every commit contains every file, the fact that the new commit does not have these files means that, as compared to the previous commit, the files were "deleted". There's no need to first extract them from the index, just to remove them (from both your working tree and Git's index): it suffices to remove them from Git's index without first extracting them (and then commit, as usual).


1"Relatively" compared to creating files in the working tree: an index entry costs an average of (very roughly) 100 bytes, so in a really big repository with millions of files, this can still be a bit expensive. There's work underway to make it even cheaper in such cases.

2These "copies" are pre-de-duplicated, just like all files in any commit. Since they're literally from the current commit, they are all necessarily duplicates, and therefore take no space for their data, just the "about 100 bytes" in footnote 1.

3You can, if you wish: just run git ls-files --stage and/or git ls-files --debug to get fairly thorough dumps of what's actually in Git's index.

torek
  • 448,244
  • 59
  • 642
  • 775
0

But if I have a large folder, which I want to delete, it still checks out the whole content to local which I will immediately delete

That should work better with Git 2.38 (Q3 2022): "git rm"(man) has become more aware of the sparse-index feature.

That means you need a git sparse-checkout set --sparse-index to activate sparse-index (Git 2.32+) in your sparse-checked out repository.

See commit ede241c, commit bcf96cf, commit b29ad38, commit ba80825 (07 Aug 2022) by Shaoxuan Yuan (ffyuanda).
See commit 3f61790 (08 Aug 2022) by Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit 9b9445c, 18 Aug 2022)

rm: integrate with sparse-index

Helped-by: Victoria Dye
Helped-by: Derrick Stolee
Signed-off-by: Shaoxuan Yuan

Enable the sparse index within the git-rm command.

The p2000 tests demonstrate a ~92% execution time reduction for 'git rm'(man) using a sparse index.

Test                              HEAD~1            HEAD -------------------------------------------------------------------------- 2000.74: git rm ...  
(full-v3)     0.41(0.37+0.05)   0.43(0.36+0.07) + 4.9% 2000.75: `git rm ...`  
(full-v4)     0.38(0.34+0.05)   0.39(0.35+0.05) + 2.6% 2000.76: `git rm ...`  
(sparse-v3)   0.57(0.56+0.01)   0.05(0.05+0.00) -91.2% 2000.77: `git rm ...`  
(sparse-v4)   0.57(0.55+0.02)   0.03(0.03+0.00) -94.7%  

Also, normalize a behavioral difference of git-rm under sparse-index.
See related discussion.

git-rm a sparse-directory entry within a sparse-index enabled repo behaves differently from a sparse directory within a sparse-checkout enabled repo.

For example, in a sparse-index repo, where 'folder1' is a sparse-directory entry, git rm -r --sparse(man) folder1 provides this:

rm 'folder1/'

Whereas in a sparse-checkout repo without sparse-index, doing so provides this:

rm 'folder1/0/0/0'
rm 'folder1/0/1'
rm 'folder1/a'

Because git rm a sparse-directory entry does not need to expand the index, therefore we should accept the current behavior, which is faster than "expand the sparse-directory entry to match the sparse-checkout situation".

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250