0

So basically I have this function that I run to check if a website exists or not. The problem is that it takes too much time. I have 50 different links to check one by one. Is there a faster way to do it? I'm currently using python 3. I've heard of the module threading but even after reading about it I'm still unsure about how to use it. Do you guys have any good resources I could read or look at to understand a little bit more about it?

Here's my code:

import requests

def checkExistingWeb():
    for i in range(1, 51):
        checkIfWebExist = requests.get("https://www.WEBSITE.com/ABCDE?id=" + str(i), allow_redirects=False)
        if checkIfWebExist.status_code == 200:
            print("\033[92mWeb Exists!\033[0m " + str(i))
        else:
            print("\033[91mWeb Does Not Exist!\033[0m " + str(i))


if __name__ == "__main__":
    checkExistingWeb()

Thanks!

Celoufran
  • 93
  • 1
  • 9
  • Does this answer your question? [How do I parallelize a simple Python loop?](https://stackoverflow.com/questions/9786102/how-do-i-parallelize-a-simple-python-loop) – DarkSuniuM Aug 09 '20 at 03:06
  • You should do these request asynchronously. That can be done with either `async` or with threading. – Ted Klein Bergman Aug 09 '20 at 03:07
  • `have any good resources I could read or look at` - requests for offsite resources is off topic here. – wwii Aug 09 '20 at 03:15
  • There is a [decent example](https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example) in the `concurrent.futures` documentation. – wwii Aug 09 '20 at 03:20
  • You may also consider using the keyword timeout for the requests to avoid waiting forever for a site that doesn't exist. – Micah Johnson Aug 09 '20 at 03:44

4 Answers4

2

You can use multiprocessing.Pool

https://docs.python.org/2/library/multiprocessing.html

import requests, multiprocessing

def Check_Link(link):
    checkIfWebExist = requests.get(link, allow_redirects=False)
        if checkIfWebExist.status_code == 200:
            print("\033[92mWeb Exists!\033[0m " + str(i))
        else:
            print("\033[91mWeb Does Not Exist!\033[0m " + str(i))

if __name__ == "__main__":
    p = multiprocessing.Pool(5)
    p.map(f, ["https://www.WEBSITE.com/ABCDE?id=" + str(i) for i in range(51)])

This code will create a thread pool and run this on the Check_Link function

Adi Mehrotra
  • 150
  • 1
  • 7
0

create the function for one execution, and have the loop outside of the function

executor = ThreadPoolExecutor()
checkweb_futures = [ executor.submit(checkExistWeb) for I in range(1, 51) ]
Ronen
  • 189
  • 3
0

I really liked @Ronen's answer, so I looked into that excellent ThreadPoolExecutor class. His answer was incomplete, as you will need to do a bit more as the processes haven't completed at the end of his snippet. You need to handle those Futures. Here is an expanded answer:

executor = ThreadPoolExecutor()
checkweb_futures = [ executor.submit(checkExistWeb,I) for I in range(1, 51) ]  # passed parameter

while not all(map(lambda x:x.done(),checkweb_futures)):
    time.sleep(1)

checkweb_results = list(map(lambda x:x.results(),checkweb_futures))

Now you can check the results of all the calls.

RufusVS
  • 4,008
  • 3
  • 29
  • 40
0

Yeah its good use for multiple threads

import requests

def checkExistingWeb():
    for i in range(1, 51):
        checkIfWebExist = requests.get("https://www.WEBSITE.com/ABCDE?id=" + str(i), allow_redirects=False)
        if checkIfWebExist.status_code == 200:
            print("\033[92mWeb Exists!\033[0m " + str(i))
        else:
            print("\033[91mWeb Does Not Exist!\033[0m " + str(i))


if __name__ == "__main__":
    with concurrent.futures.ThreadPoolExecutor() as executor:
        results = executor.map(checkExistingWeb)
Sekizyo
  • 26
  • 5