Bulk HTTP Status requests

Question

I don't have any coding knowledge. I need to run a script. That must be able to fetch the http status codes of the sites. Output must be provided like

domain.com 301 domain.com 200

I need to check huge list of sites like 200k urls. So, It must be faster at the same time. I got proxies to run it multi-threaded.

Any help/idea is highly appreciated!

Ed Sheehan · Answer 1 · 2018-12-22T11:00:55.067

0

Below is a threaded and serial approach. I have not tested the limit of concurrent threads that it can support so you may want to implement some code to limit this.

from threading import Thread
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

class Site (Thread):

    def __init__(self, thissite):
        Thread.__init__(self)
        self.pool = urllib3.PoolManager()
        self.site = thissite
        print('Started Thread for', self.site)

    def run(self):
        try:
            r = self.pool.request('GET', self.site)
            print('Thread Result', self.site, r.status)
        except:
            print('Thread Result', self.site, '404')

sitelist = []
f = open('D:\\Downloads\\SiteList.txt', 'r')
for x in f:
    print('[' + x.strip() + ']')
    sitelist.append(x.strip())

http = urllib3.PoolManager()

for site in sitelist:
    Check = Site(site)
    Check.start()

for site in sitelist:
    try:
        r = http.request('GET', site)
        print('Serial Result', site, r.status)
    except:
        print('Serial Result', site, '404')

edited Dec 22 '18 at 11:00

answered Dec 22 '18 at 09:56

Ed Sheehan

82
6

i want to import the sites list from txt file. – sheldon cooper Dec 22 '18 at 10:09
Refer to updated code. Have run this against 1,000 sites just now without issue. – Ed Sheehan Dec 22 '18 at 10:23
To ease you into the basics of Python, the following is as good a place to start as any https://www.w3schools.com/python/default.asp or here https://www.tutorialspoint.com/python/index.htm – Ed Sheehan Dec 22 '18 at 10:31
where will be the output file? – sheldon cooper Dec 22 '18 at 14:33
Check this out https://stackoverflow.com/questions/7152762/how-to-redirect-print-output-to-a-file-using-python – Ed Sheehan Dec 22 '18 at 18:39

Bulk HTTP Status requests

1 Answers1