4

I empirically noticed a significant bandwidth difference between cloning Github repositories via HTTPS (~500 KB/s) and SSH (>10 MB/s).

During a release cycle, I often perform several git clones, which by default are configured to use HTTPS (as in, git clone https://...), since it does not require authentication and is simpler for the user.

However, the repository contains about 100 MB (due to several versions, some binary files, etc.), so this command takes several minutes due to the bandwidth limit. If I change the git clone command to use git://..., it is downloaded at upwards of 10 MB/s, so it takes less than 10 seconds.

Ideally, the repository should be smaller, but anyway, I'd like to inform users about this difference, referring them to official documentation, but the help page Which remote URL should I use? does not mention it at all, neither does this SO question. The rate limit rules do not mention bandwidth either (and I am way below them, so it's unlikely to be the issue).

So I wonder: is this behavior known and reproducible for everyone? Could I be seeing some specific bandwidth throttling (possibly after having done several git clones in a short period of time)? I'd like to have an official source to refer users to.

anol
  • 8,264
  • 3
  • 34
  • 78

2 Answers2

10

Could I be seeing some specific bandwidth throttling (possibly after having done several git clones in a short period of time)?

Yes, though GitHub Support is correct, in that it's not bandwidth throttling. You're seeing CPU throttling. GitHub is not network-bound, but it is CPU-bound on cloning repositories, since computing the packfile to deliver to you and compressing it for delivery are expensive.

As Patrick Reynolds discusses in his talk at Git Merge 2016, GitHub places limits on the number of concurrent Git operations for a particular user from a particular IP to a particular repository to avoid you DoS'ing a fileserver. This can be seen by exactly what you're doing, which is avoiding to "thundering herd problem".

As Patrick notes, "the only thing that hits this limit is scripts..." and the thing that frequently hits these limits is "cloning for continuous integration". In short, GitHub analyzes the prior CPU time used to clone that repository and assume that future clones will take a similar time. When you clone several of these at the same time, GitHub calculates the expected CPU time for the total of those clones. And if you are over a given quota, some of those clones will be delayed.

This ensures that your multiple clones do not impact other users on the system.

So why are you seeing these affects with HTTPS and not with SSH? Because authenticated users have a higher quota than unauthenticated users. I suspect that if you were to authenticate with HTTPS, you would see similar response times between the two protocols.

Edward Thomson
  • 74,857
  • 14
  • 158
  • 187
  • Just to clarify, in reality I hadn't cloned many of them, just once every few minutes, and less than 10 in total, so I'm not sure CPU throttling was being applied. It was the fact that HTTPS seemed capped to a "round" number (500 KB/s) that led me into supposing a bandwidth cap. Still, I just retried cloning it (several days after the last attempt), and got 223 KB/s for HTTPS and 119 KB/s for SSH. So there really is no repeatable pattern I can see. Overall, our repository is not that famous to warrant special consideration. – anol May 28 '18 at 14:19
  • I see; I had inferred from your question that when you performed several `clone`s that you did so at the same time. So I agree that you're probably not triggering the throttling. – Edward Thomson May 28 '18 at 14:27
1

I eventually contacted Github Support, which replied:

We don't have set bandwidth caps between HTTPS and SSH.

So any observed differences must be local.

Still, if performance is really important for users, it may be useful to tell them that the protocols may have different speeds, and that they should try both and see which one works best for them.

anol
  • 8,264
  • 3
  • 34
  • 78