2

I am trying to make a GET request with the requests library from python. I do not want to skip the request, so I think that a time-out would not help me.

handling the URL in my browser does not cause any problems. When I parse the URL through the request.get() function, it takes over one minute to process.

start = time.time()

url = 'desired_url'
requests.get(url)

print(f'it took {time.time() - start} seconds to process the request')

this piece of code gives me:

it took 76.72762107849121 seconds to process the request

I am using the following version of requests:

requests==2.21.0

Since I would like to handle thousands of requests, more than a minute for each request is too long.

Any idea what happens here? How can I ensure a faster processing of my requests.get()?

stovfl
  • 14,998
  • 7
  • 24
  • 51
  • Possible the owner of the `url` don't like `python-requests` and give you penalty of **xx seconds**. – stovfl Aug 31 '19 at 08:04
  • @stovfl Thanks for your answer. Any way how I can find out if this is true for the owner of `url`? – Ivo Lindsen Aug 31 '19 at 08:15
  • I'd try the url with curl or postman and see if you're still getting lag. – Harry MW Aug 31 '19 at 08:21
  • Postman returns the GET request in around 600 ms. Which is fast enough for me. If there are additional comments how I can avoid slow `python-request` processing, feel free to provide them. – Ivo Lindsen Aug 31 '19 at 08:31
  • 1
    Try sending User-Agent as Chrome or Firefox with the GET request. `headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36'}` and then `requests.get(url, headers=headers)` – Vikas Ojha Aug 31 '19 at 08:47
  • 1
    Can you provide the URL to experiment with? – kmaork Aug 31 '19 at 09:15
  • There's a tool for creating code from a curl request, have you tried that? github.com/NickCarneiro/curlconverter – Paolo Aug 31 '19 at 10:16

1 Answers1

2

your waiting time it may not depend on you but on the server side!

if you have thousands of requests the best approach will be to use asynchronous requests. you can use grequests:

import grequests

urls = [
    'http://www.heroku.com',
    'http://python-tablib.org',
    'http://httpbin.org',
    'http://python-requests.org',
    'http://fakedomain/',
    'http://kennethreitz.com'
]


rs = (grequests.get(u) for u in urls)


grequests.map(rs)

output:

[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, None, <Response [200]>]

though you should be careful to not overwhelm the server with too many requests at the same time

kederrac
  • 16,819
  • 6
  • 32
  • 55
  • 1
    OP said they were making an individual request, which is fast in the browser but takes 72s using `requests.get()` - so it’s not obvious that the server is being flooded. – DisappointedByUnaccountableMod Aug 31 '19 at 10:39
  • no, but it makes sure that 1 or 1k requests will take the same amount of time, and this is his actual problem, the fact that he has thousands, let's wait for him to post the link – kederrac Aug 31 '19 at 10:53
  • it may be possible that he will not be able to reduce the time request from a python script because some sites are smarter, but the OP will not care because thousands of requests will be done in few minutes (even faster than using `requests` on each url with a good response time). my answer is a solution to his real problem – kederrac Aug 31 '19 at 11:01