Making so that my get python-requests are faster

Question

I have a python-script with a lot of exceptions. I'm trying to make around 50,000 requests. And it is very slow as of now also I'd like for my script to be running therefore I added almost all the exceptions request has which has mostly to do with connectionError etc.

Is there a way I can make this script so it's much faster than it is now and more modular?

for i in range(50450000,50500000):
    try:
        try:
            try:
                try:
                    try:
                        try:
                            try:
                                try:
                                    try:
                                        try:
                                            try:
                                                try:

                                                    check_response = 'http://www.barneys.com/product/adidas--22human-race-22-nmd-sneakers-'+str(i)+'.html'
                                                    make_requests = requests.get(check_response,headers=headers).text
                                                    soup  = BeautifulSoup(make_requests)
                                                    try:
                                                        main_wrapper = soup.find('h1',attrs={'class':'title'}).text
                                                        print main_wrapper + ' ' + str(i)
                                                    except AttributeError:
                                                        arr.append(check_response)
                                                        with open('working_urls.json','wb') as outfile:
                                                            json.dump(arr,outfile,indent=4)

                                                except requests.exceptions.InvalidURL:
                                                    continue
                                            except requests.exceptions.InvalidSchema:
                                                continue
                                        except requests.exceptions.MissingSchema:
                                            continue
                                    except requests.exceptions.TooManyRedirects:
                                        continue
                                except requests.exceptions.URLRequired:
                                    continue
                            except requests.exceptions.ConnectTimeout:
                                continue
                        except requests.exceptions.Timeout:
                            continue 
                    except requests.exceptions.SSLError:
                        continue
                except requests.exceptions.ProxyError:
                    continue
            except requests.exceptions.HTTPError:
                continue
        except requests.exceptions.ReadTimeout:
            continue
    except requests.exceptions.ConnectionError:
        continue

You don't need a try/except for every exeption. One try is enough then just catch each exception. If the exception thrown doesn't match the next exception tries to catch it goes to the next until one that fits or it will just thrown again and crashes your script. — Philipp, Oct 05 '16 at 07:49
All of this may be replaced by *one* try block with *one* matching `except requests.exceptions.RequestException:` clause. — vaultah, Oct 05 '16 at 07:51
@vaultah True, unless at some point he wands to handle exceptions depending on the type thrown — Philipp, Oct 05 '16 at 07:52

S. de Melo · Answer 1 · 2016-10-05T08:00:06.880

First, please replace all these ugly try/except blocks by a single one, like:

for i in range(50450000,50500000):
    try:
        check_response = 'http://www.barneys.com/product/adidas--22human-race-22-nmd-sneakers-'+str(i)+'.html'
        make_requests = requests.get(check_response,headers=headers).text
        soup  = BeautifulSoup(make_requests)
        try:
            main_wrapper = soup.find('h1',attrs={'class':'title'}).text
            print main_wrapper + ' ' + str(i)
        except AttributeError:
            arr.append(check_response)
            with open('working_urls.json','wb') as outfile:
                json.dump(arr,outfile,indent=4)
    except requests.exceptions.InvalidURL:
        continue
    except requests.exceptions.InvalidSchema:
        continue
    except requests.exceptions.MissingSchema:
        continue
    ...

And if everything you do is continue in all cases, use the base class RequestException. It becomes:

try:
    check_response = 'http://www.barneys.com/product/adidas--22human-race-22-nmd-sneakers-'+str(i)+'.html'
    make_requests = requests.get(check_response,headers=headers).text
    soup  = BeautifulSoup(make_requests)
    try:
        main_wrapper = soup.find('h1',attrs={'class':'title'}).text
        print main_wrapper + ' ' + str(i)
    except AttributeError:
        arr.append(check_response)
        with open('working_urls.json','wb') as outfile:
            json.dump(arr,outfile,indent=4)
except requests.exceptions.RequestException:
    pass

Maybe not faster, but for sure far easier to read!

As for the speed issue, you should consider using threads/processes. Take a look at the threading and multiprocessing modules.

This tidies up the code but does not answer the main part of the question. — Padraic Cunningham, Oct 05 '16 at 14:02
@PadraicCunningham Yes it does, please read the 2 last sentences. But feel free to develop or give another idea if you want. — S. de Melo, Oct 05 '16 at 14:04
Where are the examples of using the libs you linked to? Two links is not really an answer, you could add those as a comment. Have a look at the dupe — Padraic Cunningham, Oct 05 '16 at 14:05
At that time my reputation was too low to comment and I didn't have time to write a full example, sorry. The documentation of multiprocessing contains a few examples that can be adapted though. — S. de Melo, Oct 05 '16 at 14:09

Making so that my get python-requests are faster

1 Answers1