239

I'm working with a repository with a very large number of files that takes hours to checkout. I'm looking into the possibility of whether Git would work well with this kind of repository now that it supports sparse checkouts but every example that I can find does the following:

git clone <path>
git config core.sparsecheckout true
echo <dir> > .git/info/sparse-checkout
git read-tree -m -u HEAD

The problem with this sequence of commands is the original clone also does a checkout. If you add -n to the original clone command, then the read-tree command results in the following error:

error: Sparse checkout leaves no entry on working directory

How can do the sparse checkout without checking out all the files first?

Carson
  • 6,105
  • 2
  • 37
  • 45
dromodel
  • 9,581
  • 12
  • 47
  • 65
  • 4
    possible duplicate of [Is there any way to clone a git repository's sub-directory only?](http://stackoverflow.com/questions/600079/is-there-any-way-to-clone-a-git-repositorys-sub-directory-only) – Chronial Feb 05 '13 at 10:18
  • Note: `git worktree add --no-checkout` will work too (not just `git clone --no-checkout`) with git 2.9 (Just 2016). See [my answer below](http://stackoverflow.com/a/36615363/6309) – VonC Apr 14 '16 at 06:31
  • After trying all the solutions here, the only one which just downloads the directory (no pushing afterwards!) is [this](https://stackoverflow.com/a/39317180/2071807). – LondonRob Jun 19 '18 at 09:13
  • I've condensed all related questions and all related answers (I was able to find) here: https://stackoverflow.com/questions/60190759/how-do-i-clone-fetch-or-sparse-checkout-a-single-directory-or-a-list-of-directo – Richard Gomes May 26 '21 at 05:37
  • Modern, *concise* answer is [Fawaz's below.](https://stackoverflow.com/a/63786181/450917) – Gringo Suave Oct 04 '21 at 20:16

16 Answers16

176

Please note that this answer does download a complete copy of the data from a repository. The git remote add -f command will clone the whole repository. From the man page of git-remote:

With -f option, git fetch <name> is run immediately after the remote information is set up.


Try this:

mkdir myrepo
cd myrepo
git init
git config core.sparseCheckout true
git remote add -f origin git://...
echo "path/within_repo/to/desired_subdir/*" > .git/info/sparse-checkout
git checkout [branchname] # ex: master

Now you will find that you have a "pruned" checkout with only files from path/within_repo/to/desired_subdir present (and in that path).

Note that on windows command line you must not quote the path, i.e. you must change the 6th command with this one:

echo path/within_repo/to/desired_subdir/* > .git/info/sparse-checkout

if you don't you'll get the quotes in the sparse-checkout file, and it will not work

Nick Bull
  • 9,518
  • 6
  • 36
  • 58
apenwarr
  • 10,838
  • 6
  • 47
  • 58
  • note, forward slashes are required (its git repo centric) even on windows. so path/to/subdir/* NOT path\to\subdir\* – Monsters X Sep 25 '12 at 20:10
  • 1
    In my first experience that is not working, I used `echo path/to/subdir/* >> .git/info/sparse-checkout`, note I haven't used single quote to quote the path and there is an extra space after `*`, that's why it is not working, manually removing the space in sparse-checkout fixed the issue. BTW I am on Windows. – wangzq Jan 08 '13 at 00:06
  • 4
    I can't use the command "git checkout [branchname]" (also found error: Sparse checkout leaves no entry on working directory). I've used "git pull origin master" and it works properly. – Natty Sep 03 '13 at 07:15
  • 2
    With git version 1.7.2.5 on linux, I got the following results: echo 'dir/*' checks out **only** the files in dir/ but not in its subdirs; echo 'dir/' (no asterix!) correctly checks out the whole tree under dir/. HTH – pavek Oct 10 '13 at 15:01
  • 38
    This just plain didn't work for me - the "git remote" command resulted in the entire repo being checked out - bam! - right then; so the "git config..." and specification of a sub-dir of interest in the following commands had no effect. Is the repo URL specified in the "git remote" command just the path to the top-level .git file? Or should it be a path to the sub-dir of interest? – Rob Cranfill Oct 24 '13 at 14:44
  • @RobCranfill the repo URL specific in the git remote should be the path to the top-level .git file. Not sure why the `git remote` is pulling the whole repo in to start. Maybe something in your global git config? The directions by @Appenwarr worked with git version 1.8.3.4 (Apple Git-47) and version 1.7.10.4 (Debian Wheezy). – Chris Laskey Nov 10 '13 at 23:22
  • If you already have the whole repo checked out elsewhere this can be combined with [Multiple working directories with Git](http://stackoverflow.com/questions/6270193/multiple-working-directories-with-git), although you need to fix the `.git/info` folder to not be a link and just link `refs` and `exclude` so the `sparse-checkout` is not in the original repo. – Sam Hasler Aug 05 '14 at 16:29
  • perhaps the answer was edited since these comments, but this worked perfectly for me.. i can now pull/push into a standalone folder repo and see the changes in main repo and vice versa. thanks! – Sonic Soul May 06 '15 at 18:44
  • 10
    here's a streamlined version (no need for manually creating the directory, doing an init and remote add, just do the normal git clone+checkout cycle with --no-checkout option as mentioned by @onionjake): git clone --no-checkout cd echo > .git/info/sparse-checkout git checkout – Gregor Aug 10 '15 at 12:58
  • 23
    The `git remote add` command downloads everything because that's what `-f` does -- tells it to immediately fetch, before you've defined the sparse checkout options. But omitting or reordering that isn't going to help. Sparse checkouts affect only the working tree, not the repository. If you want your repository to go on a diet instead, then you need to look at the `--depth` or `--single-branch` options instead. – Miral Dec 11 '15 at 05:11
96

In 2020 there is a simpler way to deal with sparse-checkout without having to worry about .git files. Here is how I did it:

git clone <URL> --no-checkout <directory>
cd <directory>
git sparse-checkout init --cone # to fetch only root files
git sparse-checkout set apps/my_app libs/my_lib # etc, to list sub-folders to checkout
git checkout # or git switch

Note that it requires git version 2.25 installed. Read more about it here: https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout/

UPDATE:

The above git clone command will still clone the repo with its full history, though without checking the files out. If you don't need the full history, you can add --depth parameter to the command, like this:

# create a shallow clone,
# with only 1 (since depth equals 1) latest commit in history
git clone <URL> --no-checkout <directory> --depth 1
Marián Černý
  • 15,096
  • 4
  • 70
  • 83
Alexey Grinko
  • 2,773
  • 22
  • 21
  • Would be worth adding partial clone (`--filter`) to your answer here. – Tao Apr 20 '20 at 15:52
  • @alexey-grinko, the first command still had to clone the whole repo in question, even if it didn't check it out... I was looking to save the time of not cloning all the stuff I don't need... – mropp Jun 01 '20 at 22:23
  • 1
    @mropp, I updated the answer by adding `--depth` parameter which allows us to do a shallow clone. Will that help? @Tao, not sure how to use `--filter` in this case, I didn't try it. Could you provide an example, or post another answer to this topic? – Alexey Grinko Jun 03 '20 at 10:41
  • 7
    note that it doesn't work the same in 2.27 release - I don't know why. – Blazes Jun 11 '20 at 16:36
  • @AlexeyGrinko, that did help. For my particular repo I was cloning, still took a little while because there was a decent amount of stuff in the first level of depth. But, it was much, much faster. – mropp Jun 16 '20 at 17:22
  • 1
    As Blazes said it doesn't work anymore in 2.27, can't find how to make it work again. – agemO Jun 17 '20 at 07:27
  • 4
    I think I made that work on 2.28: `git clone --no-checkout cd dir git sparse-checkout set git checkout master` This last checkout populates my workdir with the files I needed in – gxh8Nmate Aug 12 '20 at 03:53
  • In 2.27 they fixed a bug that was spooking users by showing files as being deleted because it created a working index after the `init` command. I prefer using `git switch somebranch` which can checkout a remote branch or create a new one locally (from master or whatever the default branch for the repo is) for the checkout. https://stackoverflow.com/a/62480045 – dragon788 Oct 07 '20 at 16:06
  • I managed to make it work, but I had to do a `git reset` after cloning because all missing files where staged as deletions. – Thomas Levesque Oct 08 '20 at 08:17
  • @AlexeyGrinko the `--filter` option allows for not downloading the whole repository first. Check out this [form post](https://github.community/t/how-can-i-download-a-specific-folder-from-a-github-repo/278/28), where a user used `--filter=blob:none`. Mentioning that in your answer might make it even better! – murchu27 Nov 19 '20 at 16:01
  • 2
    Turns out the behavior in this answer was a bug and doesn't work in git 2.27+. See https://stackoverflow.com/questions/62423920/how-to-use-git-sparse-checkout-in-2-27 – Curtis Bezault Dec 03 '20 at 03:00
64

Works in git v2.37.1+

git clone --filter=blob:none --no-checkout --depth 1 --sparse <project-url>
cd <project>

Specify the folders you want to clone

git sparse-checkout add <folder1> <folder2>
git checkout
Michael Johansen
  • 964
  • 6
  • 18
Fawaz Ahmed
  • 1,082
  • 2
  • 14
  • 18
44

Git clone has an option (--no-checkout or -n) that does what you want.

In your list of commands, just change:

git clone <path>

To this:

git clone --no-checkout <path>

You can then use the sparse checkout as stated in the question.

onionjake
  • 3,905
  • 27
  • 46
  • 8
    yeah, it doesn't do a checkout, but still does a fetch to download the entire repo history – Jason S Oct 28 '15 at 20:40
  • 10
    @JasonS the question was specifically about not doing a checkout. If you do not want then entire history use the `--depth ` option on git clone. That will only download the last `` commits from the history. Currently there is no way to partially download a single commit with git, though if your remote supports it you can use `git archive --remote` to download partial sets of files. – onionjake Feb 10 '16 at 16:53
  • You can now also 'check out' a commit without downloading any files using https://vfsforgit.org/. This might be useful if someone is trying to only checkout a small subset of a single commit. – onionjake Apr 18 '19 at 19:10
31

I had a similar use case, except I wanted to checkout only the commit for a tag and prune the directories. Using --depth 1 makes it really sparse and can really speed things up.

mkdir myrepo
cd myrepo
git init
git config core.sparseCheckout true
git remote add origin <url>  # Note: no -f option
echo "path/within_repo/to/subdir/" > .git/info/sparse-checkout
git fetch --depth 1 origin tag <tagname>
git checkout <tagname>
sourcedelica
  • 23,940
  • 7
  • 66
  • 74
12

I found the answer I was looking for from the one-liner posted earlier by pavek (thanks!) so I wanted to provide a complete answer in a single reply that works on Linux (GIT 1.7.1):

1--> mkdir myrepo
2--> cd myrepo
3--> git init
4--> git config core.sparseCheckout true
5--> echo 'path/to/subdir/' > .git/info/sparse-checkout
6--> git remote add -f origin ssh://...
7--> git pull origin master

I changed the order of the commands a bit but that does not seem to have any impact. The key is the presence of the trailing slash "/" at the end of the path in step 5.

J-F Bergeron
  • 145
  • 1
  • 2
  • 3
    are you sure this is what you want ? the -f means fetching all the data, you still get all the other information you don't want and it is slow. (This is still " checking out the whole repository") – Shuman Mar 05 '16 at 22:45
  • 1
    I tried above steps in Windows but spare checkout does not work in command prompt so I tried Git Bash shell and it worked!!. command prompt is able to execute all the git commands like push, pull etc but when it comes to sparse checkout it fails. – user593029 Apr 06 '16 at 18:39
  • How to do only files of the subdirectory. I want to only fetch the files inside specific sub directory. – Babish Shrestha Sep 03 '16 at 04:44
  • @BabishShrestha see comment by onionjake on other answer FWIW :| – rogerdpack Sep 15 '16 at 21:47
  • This does not do a sparse clone, not as useful as newer answers that do. BTW, `-f` forces the full clone. – Gringo Suave Oct 04 '21 at 20:09
11

Updated answer 2020:

There is now a command git sparse-checkout, that I present in detail with Git 2.25 (Q1 2020)

nicono's answer illustrates its usage:

git sparse-checkout init --cone # to fetch only root files
git sparse-checkout add apps/my_app
git sparse-checkout add libs/my_lib

It has evolved with Git 2.27 and knows how to "reapply" a sparse checkout, as in here.
Note that with Git 2.28, git status will mention that you are in a sparse-checked-out repository


Note/Warning: Certain sparse-checkout patterns that are valid in non-cone mode led to segfault in cone mode, which has been corrected with Git 2.35 (Q1 2022).

See commit a3eca58, commit 391c3a1, commit a481d43 (16 Dec 2021) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 09481fe, 10 Jan 2022)

sparse-checkout: refuse to add to bad patterns

Reviewed-by: Elijah Newren
Signed-off-by: Derrick Stolee

When in cone mode sparse-checkout, it is unclear how 'git sparse-checkout'(man) add ... should behave if the existing sparse-checkout file does not match the cone mode patterns.
Change the behavior to fail with an error message about the existing patterns.

Also, all cone mode patterns start with a '/' character, so add that restriction.
This is necessary for our example test 'cone mode: warn on bad pattern', but also requires modifying the example sparse-checkout file we use to test the warnings related to recognizing cone mode patterns.

This error checking would cause a failure further down the test script because of a test that adds non-cone mode patterns without cleaning them up.
Perform that cleanup as part of the test now.


With Git 2.36 (Q2 2022), "git sparse-checkout"(man) wants to work with per-worktree configuration, but did not work well in a worktree attached to a bare repository.

See commit 3ce1138, commit 5325591, commit 7316dc5, commit fe18733, commit 615a84a, commit 5c11c0d (07 Feb 2022) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 6249ce2, 25 Feb 2022)

worktree: copy sparse-checkout patterns and config on add

Signed-off-by: Derrick Stolee
Reviewed-by: Elijah Newren

When adding a new worktree, it is reasonable to expect that we want to use the current set of sparse-checkout settings for that new worktree.
This is particularly important for repositories where the worktree would become too large to be useful.
This is even more important when using partial clone as well, since we want to avoid downloading the missing blobs for files that should not be written to the new worktree.

The only way to create such a worktree without this intermediate step of expanding the full worktree is to copy the sparse-checkout patterns and config settings during 'git worktree add'(man).
Each worktree has its own sparse-checkout patterns, and the default behavior when the sparse-checkout file is missing is to include all paths at HEAD.
Thus, we need to have patterns from somewhere, they might as well be the current worktree's patterns.
These are then modified independently in the future.

In addition to the sparse-checkout file, copy the worktree config file if worktree config is enabled and the file exists.
This will copy over any important settings to ensure the new worktree behaves the same as the current one.
The only exception we must continue to make is that core.bare and core.worktree should become unset in the worktree's config file.


Original answer: 2016

git 2.9 (June 2016) will generalize the --no-checkout option to git worktree add (the command which allows to works with multiple working trees for one repo)

See commit ef2a0ac (29 Mar 2016) by Ray Zhang (OneRaynyDay).
Helped-by: Eric Sunshine (sunshineco), and Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit 0d8683c, 13 Apr 2016)

The git worktree man page now includes:

--[no-]checkout:

By default, add checks out <branch>, however, --no-checkout can be used to suppress checkout in order to make customizations, such as configuring sparse-checkout.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
10

Sadly none of the above worked for me so I spent very long time trying different combination of sparse-checkout file.

In my case I wanted to skip folders with IntelliJ IDEA configs.

Here is what I did:


Run git clone https://github.com/myaccount/myrepo.git --no-checkout

Run git config core.sparsecheckout true

Created .git\info\sparse-checkout with following content

!.idea/*
!.idea_modules/*
/*

Run 'git checkout --' to get all files.


Critical thing to make it work was to add /* after folder's name.

I have git 1.9

expert
  • 29,290
  • 30
  • 110
  • 214
  • 3
    Nope, it still downloads everything, all commits and all files, git 2.3.2 – Tyguy7 Sep 18 '15 at 22:00
  • 9
    Sparse checkouts affect only the working tree. They don't affect the repository size or what gets fetched. You need different options if you want that. – Miral Dec 11 '15 at 05:13
  • Try Git Bash Shell next time if working in Windows & use above steps by 'pbetkier' it works fine – user593029 Apr 06 '16 at 18:47
8

Yes, Possible to download a folder instead of downloading the whole repository. Even any/last commit

Nice way to do this

D:\Lab>git svn clone https://github.com/Qamar4P/LolAdapter.git/trunk/lol-adapter -r HEAD
  1. -r HEAD will only download last revision, ignore all history.

  2. Note trunk and /specific-folder

Copy and change URL before and after /trunk/. I hope this will help someone. Enjoy :)

Updated on 26 Sep 2019

Qamar
  • 4,959
  • 1
  • 30
  • 49
8

Based on this answer by apenwarr and this comment by Miral I came up with the following solution which saved me nearly 94% of disk space when cloning the linux git repository locally while only wanting one Documentation subdirectory:

$ cd linux
$ du -sh .git .
2.1G    .git
894M    .
$ du -sh 
2.9G    .
$ mkdir ../linux-sparse-test
$ cd ../linux-sparse-test
$ git init
Initialized empty Git repository in /…/linux-sparse-test/.git/
$ git config core.sparseCheckout true
$ git remote add origin ../linux
# Parameter "origin master" saves a tiny bit if there are other branches
$ git fetch --depth=1 origin master
remote: Enumerating objects: 65839, done.
remote: Counting objects: 100% (65839/65839), done.
remote: Compressing objects: 100% (61140/61140), done.
remote: Total 65839 (delta 6202), reused 22590 (delta 3703)
Receiving objects: 100% (65839/65839), 173.09 MiB | 10.05 MiB/s, done.
Resolving deltas: 100% (6202/6202), done.
From ../linux
 * branch              master     -> FETCH_HEAD
 * [new branch]        master     -> origin/master
$ echo "Documentation/hid/*" > .git/info/sparse-checkout
$ git checkout master
Branch 'master' set up to track remote branch 'master' from 'origin'.
Already on 'master'
$ ls -l
total 4
drwxr-xr-x 3 abe abe 4096 May  3 14:12 Documentation/
$  du -sh .git .
181M    .git
100K    .
$  du -sh
182M    .

So I got down from 2.9GB to 182MB which is already quiet nice.

I though didn't get this to work with git clone --depth 1 --no-checkout --filter=blob:none file:///…/linux linux-sparse-test (hinted here) as then the missing files were all added as removed files to the index. So if anyone knows the equivalent of git clone --filter=blob:none for git fetch, we can probably save some more megabytes. (Reading the man page of git-rev-list also hints that there is something like --filter=sparse:path=…, but I didn't get that to work either.

(All tried with git 2.20.1 from Debian Buster.)

Axel Beckert
  • 6,814
  • 1
  • 22
  • 23
  • 1
    Now the man page of `git-rev-list` has been modified to reflect the removal of the `--filter=sparse:path` option: `Note that the form --filter=sparse:path= that wants to read from an arbitrary path on the filesystem has been dropped for security reasons.` – Arnie97 Oct 27 '21 at 10:17
6

Steps to sparse checkout only specific folder:

1) git clone --no-checkout  <project clone url>  
2) cd <project folder>
3) git config core.sparsecheckout true   [You must do this]
4) echo "<path you want to sparce>/*" > .git/info/sparse-checkout
    [You must enter /* at the end of the path such that it will take all contents of that folder]
5) git checkout <branch name> [Ex: master]
SANDEEP MACHIRAJU
  • 817
  • 10
  • 17
  • FYI, in the first(1) step, you no need to use --no-checkout. Just clone the whole repo and then execute all the below steps 2-5 (mentioned above), you will get the output what you want. Let me know if you didn't get it. – SANDEEP MACHIRAJU Jan 05 '19 at 22:20
5

In git 2.27, it looks like git sparse checkout has evolved. Solution in this answer does not work exactly the same way (compared to git 2.25)

git clone <URL> --no-checkout <directory>
cd <directory>
git sparse-checkout init --cone # to fetch only root files
git sparse-checkout set apps/my_app libs/my_lib # etc, to list sub-folders to checkout
# they are checked out immediately after this command, no need to run git pull

These commands worked better:

git clone --sparse <URL> <directory>
cd <directory>
git sparse-checkout init --cone # to fetch only root files
git sparse-checkout add apps/my_app
git sparse-checkout add libs/my_lib

See also : git-clone --sparse and git-sparse-checkout add

nicono
  • 321
  • 3
  • 7
4

I'm new to git but it seems that if I do git checkout for each directory then it works. Also, the sparse-checkout file needs to have a trailing slash after every directory as indicated. Someone more experience please confirm that this will work.

Interestingly, if you checkout a directory not in the sparse-checkout file it seems to make no difference. They don't show up in git status and git read-tree -m -u HEAD doesn't cause it to be removed. git reset --hard doesn't cause the directory to be removed either. Anyone more experienced care to comment on what git thinks of directories that are checked out but which are not in the sparse checkout file?

dromodel
  • 9,581
  • 12
  • 47
  • 65
2

In my case, I want to skip the Pods folder when cloning the project. I did step by step like below and it works for me. Hope it helps.

mkdir my_folder
cd my_folder
git init
git remote add origin -f <URL>
git config core.sparseCheckout true 
echo '!Pods/*\n/*' > .git/info/sparse-checkout
git pull origin master

Memo, If you want to skip more folders, just add more line in sparse-checkout file.

eric long
  • 640
  • 5
  • 16
2

I took this from TypeScript definitions library @types:

Let's say the repo has this structure:

types/
|_ identity/
|_ etc...

Your goal: Checkout identity/ folder ONLY. With all its contents including subfolders.

⚠️ This requires minimum git version 2.27.0, which is likely newer than the default on most machines. More complicated procedures are available in older versions, but not covered by this guide.

git clone --sparse --filter=blob:none --depth=1 <source-repo-url>
git sparse-checkout add types/identity types/identity ...

This will check out the types/identity folder to your local machine.

--sparse initializes the sparse-checkout file so the working directory starts with only the files in the root of the repository.

--filter=blob:none will exclude files, fetching them only as needed.

--depth=1 will further improve clone speed by truncating commit history, but it may cause issues as summarized here.

hoohoo-b
  • 1,141
  • 11
  • 12
0

Accepted answer not fully worked as needed for me, because it also downloads files from root folder.

This is how I downloaded only one folder that I needed:

git clone --no-checkout --depth=1 --filter=blob:none --branch=$(VERSION) $(REPO_URL) && \
cd $(REPO_NAME) && \
git config core.sparseCheckout true && \
echo "$(FOLDER)/" >> .git/info/sparse-checkout && \
git checkout
Kirow
  • 1,077
  • 12
  • 25