5

Let's say, I need to get https://github.com/mozilla/gecko-dev at revision/commit hash id 042b84a. Now, the entire repo is (See the size of a github repo before cloning it?):

wget -qO- https://api.github.com/repos/mozilla/gecko-dev | grep size
#  "size": 3891062,  # this in kB

... which is a bit too much for me. So, I thought, I'll get a shallow clone - that alone fetches nearly 400 MB:

git clone --depth 1 https://github.com/mozilla/gecko-dev
# remote: Counting objects: 231302, done.
# Receiving objects: 100% (231302/231302), 392.95 MiB

Now, this clones HEAD, and I cannot just get to 042b84a from here, especially not with git version 1.9.1 client that I use ( How to shallow clone a specific commit with depth 1? ; How do fetch remote branch after I cloned repo with git clone --depth 1 ; Git: get a particular revision of a git repository with depth 1). Apparently, the best I can do apart from unshallowing the repo (which will anyways download the same as a full clone) is slowly increase the depth.

I'm not sure if the "depth" simply corresponds to number of commits between HEAD and a given revision - Get git sha depth on a remote branch notes that for a full clone, you can do:

git rev-list HEAD ^042b84a --count

... so, which implies that "depth" is indeed number of commits between HEAD and a given revision - however, there is no obvious way to query this from a remote repo in git.

So, it would be cool to find the depth of the required 042b84a in respect to current HEAD - before doing a full clone/depth increase; I thought maybe using the GitHub API from the command line could help, as this is hosted from GitHub. So I tried:

cd gecko-dev

wget -qO- https://api.github.com/repos/mozilla/gecko-dev/commits/042b84a | grep date
#      "date": "2017-04-27T07:18:07Z"

curl -sI 'https://api.github.com/repos/mozilla/gecko-dev/commits?sha=042b84a' | grep last
# Link: <https://api.github.com/repositories/13509108/commits?sha=042b84a&page=2>; rel="next", <https://api.github.com/repositories/13509108/commits?sha=042b84a&page=17756>; rel="last"

wget -qO- 'https://api.github.com/repos/mozilla/gecko-dev/commits?sha=042b84a&page=17756' | grep '^    "sha"' | wc -l
# 5

Since parameter sha is "SHA or branch to start listing commits from", and GitHub API "a call to list GitHub's public repositories provides paginated items in sets of 30", and here we have 17756 pages, where 17756th page has 5 results; - so, we have 17755*30+5 = 532655 commits between 042b84a and HEAD ?

So, then I do - however:

git fetch --progress --depth=532655
# error: RPC failed; result=18, HTTP code = 200
# fatal: The remote end hung up unexpectedly

... the call fails.

Would it be possible to somehow extend this shallow clone, using git client 1.9, to include revision 042b84a without cloning all 4GB of data - by using some of the repository data that the GitHub API provides?


EDIT: Got somewhere with this, but still no definite answer. First of all, depth of 532655 is suspicious for distance between now (Jan 2018) and commit from Apr 2017. So, I tried looking up commits since date:

curl -sI 'https://api.github.com/repos/mozilla/gecko-dev/commits?since=2017-04-27T07:18:07Z' | grep last
# Link: <https://api.github.com/repositories/13509108/commits?since=2017-04-27T07%3A18%3A07Z&page=2>; rel="next", <https://api.github.com/repositories/13509108/commits?since=2017-04-27T07%3A18%3A07Z&page=1267>; rel="last"
wget -qO- 'https://api.github.com/repos/mozilla/gecko-dev/commits?since=2017-04-27T07:18:07Z&page=1267' | grep '^    "sha"' | wc -l
# 18
wcalc 1266*30+18
# = 37998
git fetch -v --progress --depth=37998
# POST git-upload-pack (419 bytes)
# error: RPC failed; result=18, HTTP code = 200
# fatal: The remote end hung up unexpectedly

So, with looking since date, we get 37998 commits or depths, but even that call cannot be fetched.

So, knowing that the commits number at least in thousands, I tried slowly increasing:

git fetch -vvvv --progress --depth=1000 origin
# remote: Counting objects: 53595, done.
# remote: Compressing objects: 100% (24434/24434), done.
# remote: Total 53595 (delta 43532), reused 36280 (delta 28120), pack-reused 0
# Receiving objects: 100% (53595/53595), 16.14 MiB | 409.00 KiB/s, done.
# Resolving deltas: 100% (43532/43532), completed with 10563 local objects.
# From https://github.com/mozilla/gecko-dev
#  = [up to date]      master     -> origin/master
git log --oneline | wc -l
# 7492

git fetch -vvvv --progress --depth=2000 origin
# remote: Counting objects: 140804, done.
# remote: Compressing objects: 100% (54300/54300), done.
# Receiving objects: 100% (140804/140804), 57.13 MiB | 404.00 KiB/s, done.
# remote: Total 140804 (delta 114158), reused 106827 (delta 84436), pack-reused 0
# Resolving deltas: 100% (114158/114158), completed with 20700 local objects.
# From https://github.com/mozilla/gecko-dev
#  = [up to date]      master     -> origin/master
git log --oneline | wc -l
# 18137

... and finally in a loop:

i=2000; until git show 042b84a; do i=$((i+1000)); echo "depth $i"; git fetch --depth=$i ; done
# fatal: ambiguous argument '042b84a': unknown revision or path not in the working tree.
# Use '--' to separate paths from revisions, like this:
# 'git <command> [<revision>...] -- [<file>...]'
# depth 3000
# remote: Counting objects: 136434, done.
# remote: Compressing objects: 100% (47014/47014), done.
# remote: Total 136434 (delta 108858), reused 110481 (delta 86139), pack-reused 0
# Receiving objects: 100% (136434/136434), 71.36 MiB | 403.00 KiB/s, done.
# Resolving deltas: 100% (108858/108858), completed with 13997 local objects.
# fatal: ambiguous argument '042b84a': unknown revision or path not in the working tree.
# Use '--' to separate paths from revisions, like this:
# 'git <command> [<revision>...] -- [<file>...]'
# depth 4000
# remote: Counting objects: 240103, done.
# remote: Compressing objects: 100% (77811/77811), done.
# remote: Total 240103 (delta 196215), reused 195977 (delta 157920), pack-reused 0
# Receiving objects: 100% (240103/240103), 117.71 MiB | 404.00 KiB/s, done.
# Resolving deltas: 100% (196215/196215), completed with 23725 local objects.
# commit 042b84af6020b1f2d8029a0dc36ac5955b7f325f [...]
git log --oneline | wc -l
# 50871
git rev-list HEAD ^042b84a --count
# 45283

(judging by how number of objects, download sizes etc increase, in this case it seems it doesn't matter that one fetched --depth=1000 already - upon issuing fetch --depth=2000, all the previous objects will be re-downloaded?)

So, the commit 042b84a finally appeared when we did git fetch --depth 4000 - so apparently the depth of this commit is 3000 < depth <= 4000 ?, and at that depth we can count 50871 log entries (commits?), while git rev-list HEAD ^042b84a --count reports 45283 (also commits?) ?! What is "depth" then, if not count of commits?

sdaau
  • 36,975
  • 46
  • 198
  • 278
  • The depth is literally the number of hops in the graph from the starting point to the desired commit (plus any fencepost issues), which is not always the same as what you'll get with `--count` since `--count` will count all reachable side branch commits as well; but `--count` will only ever *over*estimate, so it's OK if you are OK with overestimates. – torek Jan 24 '18 at 19:45

0 Answers0