55

Assume that somewhere in the web exists public git repository. I want to clone it but firstly i need to be sure what is size of it (how much objects & kbytes like in git count-objects)

Is there a way to do it?

PJ Bergeron
  • 2,788
  • 4
  • 25
  • 42
dfens
  • 5,413
  • 4
  • 35
  • 50
  • 3
    @Dogbert You can find out the size of a GitHub hosted repository thanks to their API (see this [SO question](http://stackoverflow.com/questions/8646517/see-the-size-of-a-github-repo-before-cloning-it)). I haven't found anything related to the object count, though. Hth. – nulltoken Feb 03 '12 at 21:31

5 Answers5

24

One little kludge you could use would be the following:

mkdir repo-name
cd repo-name
git init
git remote add origin <URL of remote>
git fetch origin

git fetch displays feedback along these lines:

remote: Counting objects: 95815, done.
remote: Compressing objects: 100% (25006/25006), done.
remote: Total 95815 (delta 69568), reused 95445 (delta 69317)
Receiving objects: 100% (95815/95815), 18.48 MiB | 16.84 MiB/s, done.
...

The steps on the remote end generally happen pretty fast; it's the receiving step that can be time-consuming. It doesn't actually show the total size, but you can certainly watch it for a second, and if you see "1% ... 23.75 GiB" you know you're in trouble, and you can cancel it.

Ahmad Baktash Hayeri
  • 5,802
  • 4
  • 30
  • 43
Cascabel
  • 479,068
  • 72
  • 370
  • 318
  • 8
    Are you sure this is valid? I think the percentage represents the number of objects received not the size of data. – jhabbott Dec 27 '11 at 16:05
  • 1
    I didn't say that this gave you any exact numbers, just that you can use it as a way to tell if the repository is obscenely large. – Cascabel Dec 27 '11 at 16:53
  • 4
    If you wanted to go this route you could just run `git fetch --dry-run` and then you wouldn't need to worry about canceling before the data transfer. But you're both right, it's an imperfect kludge. – bryan kennedy Feb 01 '12 at 16:49
  • 1
    I just ran a test using --dry-run and it still downloads the pack, I think it just doesn't update any of the heads. – Xentac Feb 01 '12 at 18:26
  • the size shown is for the amount of objects which has been downloaded. So it is more luck based, depend when the large object is being fetched... sometimes when my luck not so good, I got this... 10% 1MB... then when 100% it is 40 or 50MB. (exaggerated example, but this that kind of feeling and shock you have to prepare for it...) Generally this method works as git files tend to to be small, so you can make a linear guess of the size. Well at least git show the size while downloading, hg clone shows nothing! I downloaded a mercury repo ~700MB and never able know how many % left to download :S – ken May 09 '14 at 23:54
  • I dont want to clone it but want to know it size. – lindexi Mar 28 '17 at 02:10
19

[ update 21 Sep 2021 ]
It seems that the link will now be redirected to another URL, so we need to add -L to curl to follow the redirection.

curl -sL https://api.github.com/repos/Marijnh/CodeMirror | grep size


[ Old answer ]
For github repository, it now offer API to check file size. It works!

This link: see-the-size-of-a-github-repo-before-cloning-it gave the answer

Command: (answer from @VMTrooper)

curl https://api.github.com/repos/$2/$3 | grep size

Example:

curl https://api.github.com/repos/Marijnh/CodeMirror | grep size
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100  5005  100  5005    0     0   2656      0  0:00:01  0:00:01 --:--:--  2779
"size": 28589,
ken
  • 13,869
  • 6
  • 42
  • 36
  • Did not work I want to check [https://github.com/madhur/PortableJekyll](https://github.com/madhur/PortableJekyll) and it stops quickly with `Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 140k 0 140k 0 0 155k 0 --:--:-- --:--:-- --:--:-- 155k` – Timo Feb 22 '21 at 07:48
  • 1
    if you have `jq`, you can directly get just the size : `curl -s https://api.github.com/repos/git/git | jq '.size'` – philb Sep 16 '21 at 16:21
  • @Timo see updated answer – ken Sep 21 '21 at 03:37
10

Doesn't give the object count, but if you use Google Chrome browser and install this extension

It adds the repo size to the home page:

GitHub Repo Size extension screenshot

Bigwave
  • 2,166
  • 1
  • 17
  • 28
3

I think there are a couple problems with this question: git count-objects doesn't truly represent the size of a repository (even git count-object -v doesn't really); if you're using anything other than the dumb http transport, a new pack will be created for your clone when you make it; and (as VonC pointed out) anything you do to analyze a remote repo won't take into account the working copy size.

That being said, if they are using the dumb http transport (github, for example, is not), you could write a shell script that used curl to query the sizes of all the objects and packs. That might get you closer, but it's making more http requests that you'll just have to make again to actually do the clone.

It is possible to figure out what git-fetch would send across the wire (to a smart http transport) and send that to analyze the results, but it's not really a nice thing to do. Essentially you're asking the target server to pack up results that you're just going to download and throw away, so that you can download them again to save them.

Something like these steps can be used to this effect:

url=https://github.com/gitster/git.git
git ls-remote $url |
  grep '[[:space:]]\(HEAD\|refs/heads/master\|refs/tags\)' |
  grep -v '\^{}$' | awk '{print "0032want " $1}' > binarydata
echo 00000009done >> binarydata
curl -s -X POST --data-binary @binarydata \
  -H "Content-Type: application/x-git-upload-pack-request" \
  -H "Accept-Encoding: deflate, gzip" \
  -H "Accept: application/x-git-upload-pack-result" \
  -A "git/1.7.9" $url/git-upload-pack | wc -c

At the end of all of this, the remote server will have packed up master/HEAD and all the tags for you and you will have downloaded the entire pack file just to see how big it will be when you download it during your clone.

When you finally do a clone, the working copy will be created as well, so the entire directory will be larger than these commands spit out, but the pack file generally is the largest part of a working copy with any significant history.

Zombo
  • 1
  • 62
  • 391
  • 407
Xentac
  • 397
  • 1
  • 4
2

Not that I know of:
Git is not a server, there is nothing by default listening to a request (unless you activate a gitweb, or a gitolite layer)
And the command "git remote ..." deals with the local copy (fetched) of a remote repo.

So unless you fetch something, or clone --bare a remote repo (which does not checkout the files, so you only have the Git database alone), you won't have an idea of its size.
And that does not include the size of the working directory, once checked out.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Would you mind sharing how a `clone --bare` could provide the info requested by the OP? Then, yours could become a very interesting and relevant answer. (Good point that Git is not a server.) – XavierStuvw Nov 12 '20 at 06:46
  • @XavierStuvw Sure, 10 years later, I have edited the answer to clarify why a bare repository is a good way to get the size of said repository. – VonC Nov 12 '20 at 06:48