60

How does the Requests library compare with the PyCurl performance wise?

My understanding is that Requests is a python wrapper for urllib whereas PyCurl is a python wrapper for libcurl which is native, so PyCurl should get better performance, but not sure by how much.

I can't find any comparing benchmarks.

Eugene
  • 10,957
  • 20
  • 69
  • 97

4 Answers4

137

I wrote you a full benchmark, using a trivial Flask application backed by gUnicorn/meinheld + nginx (for performance and HTTPS), and seeing how long it takes to complete 10,000 requests. Tests are run in AWS on a pair of unloaded c4.large instances, and the server instance was not CPU-limited.

TL;DR summary: if you're doing a lot of networking, use PyCurl, otherwise use requests. PyCurl finishes small requests 2x-3x as fast as requests until you hit the bandwidth limit with large requests (around 520 MBit or 65 MB/s here), and uses from 3x to 10x less CPU power. These figures compare cases where connection pooling behavior is the same; by default, PyCurl uses connection pooling and DNS caches, where requests does not, so a naive implementation will be 10x as slow.

Combined-chart-RPS CPU Time by request size detailed

Just HTTP throughput Just HTTP RPS

Note that double log plots are used for the below graph only, due to the orders of magnitude involved HTTP & HTTPS throughput HTTP & HTTPS RPS

  • pycurl takes about 73 CPU-microseconds to issue a request when reusing a connection
  • requests takes about 526 CPU-microseconds to issue a request when reusing a connection
  • pycurl takes about 165 CPU-microseconds to open a new connection and issue a request (no connection reuse), or ~92 microseconds to open
  • requests takes about 1078 CPU-microseconds to open a new connection and issue a request (no connection reuse), or ~552 microseconds to open

Full results are in the link, along with the benchmark methodology and system configuration.

Caveats: although I've taken pains to ensure the results are collected in a scientific way, it's only testing one system type and one operating system, and a limited subset of performance and especially HTTPS options.

BobMcGee
  • 19,824
  • 10
  • 45
  • 57
  • 3
    Your benchmark is nice, but localhost has no network layer overhead whatsoever. If you could cap the data transfer speed at actual network speeds, using realistic response sizes (`pong` is not realistic), and including a mix of content-encoding modes (with and without compression), and *then* produce timings based on that, then you'd have benchmark data with actual meaning. – Martijn Pieters Oct 02 '15 at 07:47
  • I also note that you moved the setup for pycurl out of the loop (setting the URL and writedata target should arguably be part of the loop), and don't read out the `cStringIO` buffer; the non-pycurl tests all have to produce the response as a Python string object. – Martijn Pieters Oct 02 '15 at 07:52
  • 3
    @MartijnPieters Lack of network overhead is intentional; the intent here is to test the client in isolation. The URL is pluggable there, so you can test it against a real, live server of your choice (by default it doesn't, because I don't want to hammer someone's system). **Key note:** the later test of pycurl reads out the response body via body.getvalue, and performance is very similar. PRs are welcome for the code if you can suggest improvements. – BobMcGee Oct 02 '15 at 12:29
  • @MartijnPieters I did try testing with external servers, but... with this many connection requests, it triggers DoS prevention measures unfortunately. If you've got notions on how to avoid that, be my guest. – BobMcGee Oct 02 '15 at 12:38
  • I was talking about using a network interface throttle (see some [sample applications that achieve this](http://superuser.com/questions/330501/how-can-i-simulate-a-slow-connection-or-limit-the-bandwidth-that-firefox-can-us)) plus some real-world data loads to see how much of a difference pycurl makes to different scenarios. – Martijn Pieters Oct 02 '15 at 14:51
  • @MartijnPieters Please, if you know a good way to do this, submit a PR! I wanted to get something out there to get actual numbers, but didn't have the time to invest in designing a full framework. As it stands, the benchmark is definitely open to improvement and enhancement, and would welcome any contributions! – BobMcGee Oct 02 '15 at 15:13
  • I'm sorry, I don't have the time right now either, nor do I have a network conditioner ready to go. – Martijn Pieters Oct 02 '15 at 15:24
  • This is not a good benchmark for using Requests. This creates a new connection with every single request. You should be using a session. – Kenneth Reitz Feb 23 '16 at 05:26
  • Okay, I cleaned up the benchmarks, with connection-reuse: Requests: 4.47s. Urllib3: 2.9s. PyCurl: 0.639351. – Kenneth Reitz Feb 23 '16 at 06:06
  • 1
    @KennethReitz Yeah, it's a fairly rough benchmark, and if you've got improvements ready to go, I'd welcome a PR (and can rerun on the original system for apples-to-apples comparison)! We really should have benchmark coverage with and without connection reuse for all cases. This is because one might be issuing requests to different servers or a string of requests to the same one. Based on your figures, I think we're still not wrong in saying pycurl is between 3x and 10x faster with the same connection behavior. – BobMcGee Feb 23 '16 at 19:15
  • @KennethReitz I've merged your PR, integrated against a rework of the command line execution and test format, and am investigating to see if the (now) anomalously bad pycurl performance is real or a result of bad implementation. – BobMcGee Feb 24 '16 at 13:46
  • @KennethReitz Thank you, fancy graphs including your PR are now available on Github (and a much-refined test script + Docker image). – BobMcGee Mar 01 '16 at 05:42
  • 2
    @Martijn_Pieters You may want to take a look again, I've updated with a benchmark with full network overheads in AWS. – BobMcGee Mar 01 '16 at 13:58
21

First and foremost, requests is built on top of the urllib3 library, the stdlib urllib or urllib2 libraries are not used at all.

There is little point in comparing requests with pycurl on performance. pycurl may use C code for its work but like all network programming, your execution speed depends largely on the network that separates your machine from the target server. Moreover, the target server could be slow to respond.

In the end, requests has a far more friendly API to work with, and you'll find that you'll be more productive using that friendlier API.

Régis B.
  • 10,092
  • 6
  • 54
  • 90
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 4
    I agree that for most applications the clean API of requests matters most; but for network-intensive applications, there's no excuse *not* to use pycurl. The overhead may matter (especially within a data center). – BobMcGee Oct 02 '15 at 03:06
  • 4
    @BobMcGee: if the network speeds are so high that the overhead is going to matter, you should not be using Python for the whole application anymore. – Martijn Pieters Oct 02 '15 at 07:46
  • 4
    @Martijn_Pieters Disagree -- python performance isn't that bad, and in general it's pretty easy to delegate the performance-sensitive bits to native libraries (which pycurl is a perfect example of). DropBox can make it work, and yum internally uses pycurl (since a lot of its work is simply network fetches, which need to be as fast as possible). – BobMcGee Oct 02 '15 at 12:00
  • 1
    @BobMcGee: yes, for specialist codebases like yum it can be worth the pain of having to deal with the pycurl API; for the vast majority of URL processing needs however the tradeoff lies heavily in favour of `requests`. In other words, most projects will not need to go through the pain of using `pycurl`; in my *opinion* you need to be pretty network-heavy before it is worth giving up the `requests` API; the difference in ease of development is huge. – Martijn Pieters Oct 02 '15 at 14:55
  • 2
    @MarijnPieters: Totally agree with that! Requests should be the default go-to unless network performance is critical (or you need low-level curl functionality). To complete that picture we now have a benchmark that someone can use to test for themself. – BobMcGee Oct 02 '15 at 15:58
8

It seems there is a new kid on the block: - requests interface for pycurl.

Thank You for the bench mark - it was nice - I like curl and it seems to be able to do a bit more than http.

https://github.com/dcoles/pycurl-requests

user2692263
  • 475
  • 4
  • 8
2

Focussing on Size -

  1. On my Mac Book Air with 8GB of RAM and a 512GB SSD, for a 100MB file coming in at 3 kilobytes a second (from the internet and wifi), pycurl, curl and the requests library's get function (regardless of chunking or streaming) are pretty much the same.

  2. On a smaller Quad core Intel Linux box with 4GB RAM, over localhost (from Apache on the same box), for a 1GB file, curl and pycurl are 2.5x faster than the 'requests' library. And for requests chunking and streaming together give a 10% boost (chunk sizes above 50,000).

I thought I was going to have to swap requests out for pycurl, but not so as the application I'm making isn't going to have client and server that close.

paul_h
  • 1,859
  • 3
  • 19
  • 27