1

I've got a list of ~100,000 links that I'd like to check the HTTP Response Code for. What might be the best method to use for doing this check programmatically?

I'm considering using the below Python code:

import requests
try:
  for x in range(0, 100000):
    r = requests.head(''.join(["http://stackoverflow.com/", str(x)]))
    # They'll actually be read from a file, and aren't sequential
    print r.status_code
except requests.ConnectionError:
  print "failed to connect"

.. but am not aware of the potential side effects of checking such a large number of URLs in a single take. Thoughts?

FloatingRock
  • 6,741
  • 6
  • 42
  • 75
  • 1
    Found the answer [here](http://stackoverflow.com/questions/2632520/what-is-the-fastest-way-to-send-100-000-http-requests-in-python) to the exact same question! – FloatingRock May 04 '15 at 09:20

1 Answers1

1

The only side effect I can think of is time, which you can mitigate by making the requests in parallel. (use http://gevent.org/ or https://docs.python.org/2/library/thread.html).

faebser
  • 139
  • 1
  • 11