1

I don't have any coding knowledge. I need to run a script. That must be able to fetch the http status codes of the sites. Output must be provided like

domain.com 301 domain.com 200

I need to check huge list of sites like 200k urls. So, It must be faster at the same time. I got proxies to run it multi-threaded.

Any help/idea is highly appreciated!

1 Answers1

0

Below is a threaded and serial approach. I have not tested the limit of concurrent threads that it can support so you may want to implement some code to limit this.

from threading import Thread
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

class Site (Thread):

    def __init__(self, thissite):
        Thread.__init__(self)
        self.pool = urllib3.PoolManager()
        self.site = thissite
        print('Started Thread for', self.site)

    def run(self):
        try:
            r = self.pool.request('GET', self.site)
            print('Thread Result', self.site, r.status)
        except:
            print('Thread Result', self.site, '404')

sitelist = []
f = open('D:\\Downloads\\SiteList.txt', 'r')
for x in f:
    print('[' + x.strip() + ']')
    sitelist.append(x.strip())

http = urllib3.PoolManager()

for site in sitelist:
    Check = Site(site)
    Check.start()

for site in sitelist:
    try:
        r = http.request('GET', site)
        print('Serial Result', site, r.status)
    except:
        print('Serial Result', site, '404')
Ed Sheehan
  • 82
  • 6