63

I want to clone the Linux kernel repo, but only from version 3.0 onwards, since the kernel repo is so huge it makes my versioning tools run faster if I can do a shallow clone. The core of my question is: how can I tell git what the "n" value is for the --depth parameter? I was hoping this would work:

git clone http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git --depth v3.0

thanks.

Chris H
  • 6,433
  • 5
  • 33
  • 51
  • 1
    See also [How do I remove the old history from a git repository?](http://stackoverflow.com/questions/4515580/how-do-i-remove-the-old-history-from-a-git-repository) – Alberto Oct 21 '15 at 14:05

6 Answers6

110

How about cloning the tag to a depth of 1?

  • git clone --branch mytag0.1 --depth 1 https://example.com/my/repo.git

Notes:

  • --depth 1 implies --single-branch, so no info from other branches is brought to the cloned repository
  • if you want to clone a local repository, use file:// instead of only the repository path
johntellsall
  • 14,394
  • 4
  • 46
  • 40
n8henrie
  • 2,737
  • 3
  • 29
  • 45
7

Read fully for a solution, but unfortunately, git clone does not work in the fashion you are requesting. The --depth parameter limits the number of revisions not the number of commits. There is not a clone parameter which limits the amount of commits. In your situation, even if you knew that there were only at most 10 revision differences from the file that has changed the most between v3.0 and the newest HEAD in the repo and used --depth 10 you could still get most or the whole repo history. Because some objects may not have as many as 10 revisions and you will get their history all the way back to the beginning of their first appearance in the repo.

Now here is how to do what you like: The key to your issue is that you need the commits between v3.0 and the recent most reference you want. Here are the steps I did to do just that:

  • git clone http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git --depth 10075 smaller_kernel_repo
  • cd smaller_kerenel_repo
  • Determine the sha of v3.0 git log --oneline v3.0^..v3.0
  • Create a graft point starting with this sha (it is 02f8c6aee8df3cdc935e9bdd4f2d020306035dbe)
  • echo "02f8c6aee8df3cdc935e9bdd4f2d020306035dbe" > .git/info/grafts
  • To get around some issues with some kernel log entries do: export GIT_AUTHOR_NAME="tmp" and export GIT_COMMITTER_NAME="tmp"

  • There is a nice warning about in the man page about git filter-branch rewriting history by following graft points... so lets abuse that, now run git filter-branch and sit back and wait...(and wait and wait)

Now you need to clean up everything:

git reflog expire --expire=now --all
git repack -ad  # Remove dangling objects from packfiles
git prune       # Remove dangling loose objects

This process is time consuming but not very complex. Hopefully it will save you all the time you were hoping for in the long run. At this point you will have is essentially a repo with an amended history of only v3.0 onwards from the linux-stable.git repo. Just like if used the --depth on clone you have the same restrictions on the repo and would only be able to modify and send patches from the history you already have. There are ways around that.. but it deserves its own Q&A.

I am in the process of testing out the last few steps myself, but the git filter-branch operation is still going. I'll update this post with any issues, but I'll go ahead and post it so you can start on this process if you find it acceptable.

UPDATE

Workaround for issue (fatal: empty ident <> not allowed). This issue stems with a problem in the commit history of the linux repo.

Change the git filter-branch command to:

git filter-branch --commit-filter '
    if [ "$GIT_AUTHOR_EMAIL" = "" ];
    then
            GIT_AUTHOR_EMAIL="tmp@tmp";
            GIT_AUTHOR_NAME='tmp'
            GIT_COMMITTER_NAME='Me'
            GIT_COMMITTER_EMAIL='me@me.com'
            git commit-tree "$@";
    else
            git commit-tree "$@";
    fi '
James
  • 1,754
  • 14
  • 22
  • 7
    I think it's over-complicating things to strictly distinguish between *revision* and *commit* here. While I'm aware of [the formal difference](http://stackoverflow.com/a/11792712/1127485), in the context of `git clone --depth ` the number of revisions equals the number of commits from the tips. – sschuberth Jan 06 '16 at 10:40
5

For someone who already has a clone this command will get the number of commits between tip of current branch and the tag v5.2:

$ git rev-list HEAD ^v5.2 --count
407

I found this project implementing rev-list using the GitHub API: https://github.com/cjlarose/github-rev-list

The very lengthy man page on rev-list indicates there is a lot going on behind the scenes. There are many different paths to possibly count commits through with branches and merges coming and going. For this use case though that can probably be ignored(?)

matt wilkie
  • 17,268
  • 24
  • 80
  • 115
3

Unfortunately the --depth parameter of git clone accepts only a number, the number of revisions to which the cloning repository should be truncated.

A possible solution is to clone entire repository, and then truncate its history to keep only commits after v3.0. Here is a good how-to: http://bogdan.org.ua/2011/03/28/how-to-truncate-git-history-sample-script-included.html

git checkout --orphan temp v3.0
git commit -m "Truncated history"
git rebase --onto temp v3.0 master
git branch -D temp
git gc
Jim DeLaHunt
  • 10,960
  • 3
  • 45
  • 74
tomgi
  • 1,422
  • 11
  • 20
  • 1
    That should work as well as the solution I provided, but I would also suggest deleting all other local references and running the cleanup steps that I have in my solution. Without that, the repo will still contain the full history and extra objects. With this repo that is about 2 million objects hanging around than needed. – James Jan 19 '12 at 22:07
  • 1
    This strategy requires managing conflicting merges and runs the risk of producing not an exact copy of the final master depending on how the merges are handled. Since the repo is so big it is very unlikely you could do the merges by hand so you can add the `-Xours` or `-Xtheirs` option to the rebase command. I'm sure you'd find the final result differs from the master ref's source. – James Jan 20 '12 at 00:10
0

I published a Github Action to do this.

https://github.com/AlexAtkinson/github-action-checkout-from-tag

You can checkout the repo for the script that does the heavy lifting. Mind the license.

muzimuzhi Z
  • 187
  • 10
EvilKittenLord
  • 908
  • 4
  • 8
0

The --depth parameter seems to be only a number (the "specified number of revisions"), not a tag.

Possible idea (to be tested):

You could use git describe though in order to get the most recent tag from you current HEAD, as well as the number of commit between said tag and HEAD.
If that "most recent tag" isn't your tag, simply repeat the process, starting from the commit referenced by that latest tag, up until you find your tag (v3.0 in your case for instance).

The sum of all those commit numbers will give you the depth to give to the git clone command, provided your tag is accessible from your current HEAD.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250