1

Basically, I have a url that I'm hitting to get some XML data. The endpoint I cannot disclose, but doing:

curl -v "http://my-url.com/some/endpoint"

returns a 200 OK and the content pretty much instantly.

Using the requests module by Kenneth Reitz, I have both a POST request and a GET request that both take 30 seconds to return content.

If I use it this way:

from timeit import Timer

t = Timer(lambda: requests.get(myurl).content)
print t.timeit(number=1)
30.2136261463

it takes 30.2 sec on average each time. Same with my POST request. If I don't ask for content and just the status_code response, I get the same situation, unless if I pass the stream=True, where I get the response quickly, but not the content.

My confusion is within the curl command... I get both the response and content in under 10ms. I tried faking the user-agent in my python test, tried passing numerous arguments to the get() function etc. There must be some major difference between how curl and python-requests do requests that I am not aware of. I am a newbie, so I do apologise if I am missing something obvious.

I would also like to mention that I have tried multiple machines for this, multiple version of curl, python and even tried some REST clients like Postman etc. Only curl performs lightning fast - hitting the same endpoint in every case BTW. I understand one of the options is to do a subprocess call to curl within my test, but... Is that a good idea?

EDIT: I care about the content. I am aware I can get the response code quickly (headers).

Thanks in advance,

Tihomir.

UPDATE:

I am now using pycurl2 in my test, so this is just a workaround as I was hoping I could use python-requests for everything. Still curious as to why is curl so much faster.

tsaulic
  • 707
  • 1
  • 11
  • 22
  • possible duplicate of [python requests is slow](http://stackoverflow.com/questions/15780679/python-requests-is-slow) – jrd1 Sep 25 '13 at 04:43
  • the mentioned thread's solution satisfies the OP as they don't care about the content. I need the content, and in my case only that is slow. – tsaulic Sep 25 '13 at 04:47

2 Answers2

4

Since this question is not generating any interest at all, I am going to accept my own workaround solution - which involves using pycurl2 instead of requests for the problematic requests.

Only 2 of all of them are slow, and doing this fixed my issue, but it's not a solution I was hoping for.

NOTE: I am not saying in any way that requests is slow or bad. This seemed to be an issue with gzip compression and GlassFish serving gzipped data with a buggy length. I just wanted to know why it's not affecting curl/wget.

tsaulic
  • 707
  • 1
  • 11
  • 22
1

One thing to do would be to use:

requests.get(url, stream=False)

instead of what you've posted. See this link for more:

http://docs.python-requests.org/en/latest/user/advanced/

DISCUSSION

  • Curl is an executable.
  • Python is an interpreted language.

As a result, Python has a much slower "startup" time than curl, which contributes to it's relatively slow speed despite the fact that IO is CPU bound. This is one of the trade-offs of using an interpreted language. But, generally while you get a relatively slow execution, the development and maintenance time far outweighs that "loss". (Note: I said generally).

One possible solution is as you say, to use Python to wrap curl in a script - which is not a bad idea, but can lead to disastrous problems (depending on usage, say deleting files), without care as there are race conditions to consider.

Another approach is to try and decompose the original Python code into a language like C/C++, so you can compile it and get near equivalent performance that you desire. Examples are using shedSkin and Cython.

jrd1
  • 10,358
  • 4
  • 34
  • 51
  • 2
    Hmm.. I really don't think that this has anything to do with curl being a binary and python being interpreted in this case, as the difference is 30s :D. It's more likely that there's something wrong with the service I'm hitting, or maybe curl does something that none of the other mentioned services do. But thanks for the input. I've been trying to figure this out for a while now. Note that calling the curl command from python interpreter also executes lightning fast and I get content. Python is not really that slow from what I've read previously. – tsaulic Sep 25 '13 at 04:34
  • @TihomirSaulic, to some degree. Depending on the usage, the difference can be as high as orders of magnitude. But, generally, the issue is how the OP is requesting the stream info. – jrd1 Sep 25 '13 at 04:38
  • I tried `requests.head(url, allow_redirects=False)` and I get empty content :( .. if I don't request the content (which makes sense for a head request?) then I get a response quickly, but I care about the content too... not sure what to do with this one! P.S. I am the OP :D – tsaulic Sep 25 '13 at 04:39
  • @TihomirSaulic. Edited, try it again. – jrd1 Sep 25 '13 at 04:41
  • From what I've understood, stream=False is default, and stream=True enables you to perform conditional operations based on content's properties. Like abort if it's too long etc. Anyway, stream=False didn't help, and stream=True only enables me get the response instantly, that's all. Thanks for help, I will keep digging into this a bit later. – tsaulic Sep 25 '13 at 04:57
  • @TihomirSaulic, sorry I couldn't be of more help! :( Good luck with your search, though! – jrd1 Sep 25 '13 at 04:58
  • don't sweat it! thanks for trying though :) I'll keep experimenting! – tsaulic Sep 25 '13 at 05:02