0

I'm trying to do a simple task but I am very new to Python so would appreciate some help. I have this piece of code to find 404 errors in Python:

import requests

try:
    r = requests.head("http://stackoverflow.com")
    print r.status_code

except requests.ConnectionError:
    print "failed to connect"

Which I obtained by looking at solutions in stackoverflow (thanks to user Goumeau). I have thousands of urls in a csv which I would like to import and then run this code with. What I am looking for in the end is a list containing the url and the http status code associated with each url. The question is how do I import my list of urls and then run this code above in an iterate manner?

And if I'm lucky, how would I then obtain the list of answers?

Thanks for reading.

Cory Kramer
  • 114,268
  • 16
  • 167
  • 218
newbie68
  • 5
  • 1
  • 1
    What is the structure of the csv? One url per line? Or multiple. Please show a sample of the csv file. – b10n Sep 19 '14 at 00:23
  • hey there, yeah that's right: the csv is one url per line (vertically-speaking...'column' A contains 1000s of urls). hope that makes sense. thanks. – newbie68 Sep 19 '14 at 01:08

1 Answers1

1

I'm assuming a file of urls, one per line.

def get_url_status(url):
    try:
        r = requests.head(url)
        return url, r.status_code
    except requests.ConnectionError:
        print "failed to connect"
        return url, 'error'

results = {}
with open('url.csv', 'rb') as infile:
    for url in infile:
        url_status = get_url_status(url)
        results[url_status[0]] = url_status[1]
b10n
  • 1,166
  • 9
  • 8
  • Hey there, I've tried using this code but I get this error below. My csv contains a couple of test urls: File "/Library/Python/2.7/site-packages/requests/models.py", line 345, in prepare_url "Perhaps you meant http://{0}?".format(url)) requests.exceptions.MissingSchema: Invalid URL u'google.com\rstackoverflow.com': No schema supplied. Pstackoverflow.com?ttp://google.com – newbie68 Sep 19 '14 at 18:18
  • What kind of exception did it throw when it tried to parse that line? Perhaps handle that exception type and the program should continue. – b10n Sep 19 '14 at 18:25
  • And after several other attempts, I get a 'failed to connect' error. Sorry, any advice you can give would be much appreciated, thanks. – newbie68 Sep 19 '14 at 18:35
  • `requests.ConnectionError` doesn't catch all possible exceptions. Have a look at this [answer](http://stackoverflow.com/questions/16511337/correct-way-to-try-except-using-python-requests-module) – b10n Sep 19 '14 at 18:40
  • thanks for that - i don't know if you can stomach it, but now I'm getting this error message: InvalidSchema: No connection adapters were found for 'url – newbie68 Sep 19 '14 at 23:35
  • i've tried different ways of formatting the urls - with the http://stackoverflow.com, and without the http:// etc. but nothing seemed to work. any ideas? – newbie68 Sep 19 '14 at 23:36