13

How do I clone, fetch or sparse checkout a single file or directory or a list of files or directories from a git repository avoiding downloading the entire history or at least keeping history download at minimum?

For the benefit of people landing here, these are references to other similar questions:

These similar questions were asked long ago and git evolved ever since, which ended up causing a flood of different answers, some better, some worse, depending on the version of git being considered. The trouble is that not a single answer from these aforementioned questions attend all requirements from all these questions combined, which means that you have to read all answers and compile in your head your own answer which eventually attend all requirements.

This question here expands on previous questions mentioned, imposing more flexible and stringent requirements than all other questions combined. So, once again:

How do I clone, fetch or sparse checkout a single file or directory or a list of files or directories from a git repository avoiding downloading the entire history or at least keeping history download at minimum?

Richard Gomes
  • 5,675
  • 2
  • 44
  • 50
  • Does this answer your question? [How do I clone a subdirectory only of a Git repository?](https://stackoverflow.com/questions/600079/how-do-i-clone-a-subdirectory-only-of-a-git-repository) – phd Feb 12 '20 at 14:51
  • https://stackoverflow.com/search?q=%5Bgit%5D+shallow+clone+sparse+checkout – phd Feb 12 '20 at 14:51
  • @phd : No, not really. The function we can see as part of the answer you've mentioned pulls the entire history of all branches. My implementation pulls the history of only one branch and AFAIK pulls only the tip of the history. – Richard Gomes Feb 12 '20 at 15:12
  • @phd : Your second comment has a broken link. – Richard Gomes Feb 12 '20 at 15:13
  • There're many answers at the linked dup. `git clone --depth` is mentioned as well as `git clone --filter`. The second search link works for me. – phd Feb 12 '20 at 15:16
  • @phd : I've edited the question, explaining the need for it and making references to other similar questions. However, the most important benefit of this question is providing an answer which is complete, self contained, tested, well documented, has example of use and fulfils all requirements combined from previous questions and more requirements introduced by this question. – Richard Gomes Feb 13 '20 at 00:13

2 Answers2

12

This bash function below does the trick.

function git_sparse_checkout {
    # git repository, e.g.: http://github.com/frgomes/bash-scripts
    local url=$1
    # directory where the repository will be downloaded, e.g.: ./build/sources
    local dir=$2
    # repository name, in general taken from the url, e.g.: bash-scripts
    local prj=$3
    # tag, e.g.: master
    local tag=$4
    [[ ( -z "$url" ) || ( -z "$dir" ) || ( -z "$prj" ) || ( -z "$tag" ) ]] && \
        echo "ERROR: git_sparse_checkout: invalid arguments" && \
        return 1
    shift; shift; shift; shift

    # Note: any remaining arguments after these above are considered as a
    # list of files or directories to be downloaded.
    
    mkdir -p ${dir}
    if [ ! -d ${dir}/${prj} ] ;then
        mkdir -p ${dir}/${prj}
        pushd ${dir}/${prj}
        git init
        git config core.sparseCheckout true
        local path="" # local scope
        for path in $* ;do
            echo "${path}" >> .git/info/sparse-checkout
        done
        git remote add origin ${url}
        git fetch --depth=1 origin ${tag}
        git checkout ${tag}
        popd
    fi
}

This is an example of how this can be used:

function example_download_scripts {
  url=http://github.com/frgomes/bash-scripts
  dir=$(pwd)/sources
  prj=bash-scripts
  tag=master
  git_sparse_checkout $url $dir $prj $tag "user-install/*" sysadmin-install/install-emacs.sh
}

In the example above, notice that a directory must be followed by /* and must be between single quotes or double quotes.

UPDATE: An improved version can be found at: https://github.com/frgomes/bash-scripts/blob/master/bin/git_sparse_checkout

Richard Gomes
  • 5,675
  • 2
  • 44
  • 50
0

If you only want the files without history you can use svn:

SUBDIR=foo
svn export https://github.com/repository.git/trunk/$SUBDIR
gdkrmr
  • 674
  • 4
  • 16