I'm trying to get the status_code from various URLs in a csv file using the requests Python module. It works for some websites, but for most of them it shows 'Connection Refused', even though if I visit the websites through the browser they load just fine.
The code looks like this:
import pandas as pd
import requests
from requests.adapters import HTTPAdapter
from fake_useragent import UserAgent
import time
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
df = pd.read_csv('Websites.csv')
output_data = pd.DataFrame(columns=['url', 'status'])
number_urls = df.shape[0]
i = 0
for url in df['urls']:
session = requests.Session()
adapter = HTTPAdapter(max_retries=3)
adapter.max_retries.respect_retry_after_header = False
session.mount('http://', adapter)
session.mount('https://', adapter)
print(url)
ua = UserAgent()
header = {'User-Agent':str(ua.chrome)}
try:
# Status
start = time.time()
response = session.get(url, headers=header, verify=False, timeout=0.5)
request_time = time.time() - start
info = "Request completed in {0:.0f}ms".format(request_time)
print(info)
status = response.status_code
if (status == 200):
status = "Connection Successful"
if (status == 404):
status = "404 Error"
if (status == 403):
status = "403 Error"
if (status == 503):
status = "503 Error"
print(status)
output_data.loc[i] = [df.iloc[i, 0], status]
i += 1
except requests.exceptions.Timeout:
status = "Connection Timed Out"
print(status)
request_time = time.time() - start
info = "TimeOut in {0:.0f}ms".format(request_time)
print(info)
output_data.loc[i] = [df.iloc[i, 0], status]
i += 1
except requests.exceptions.ConnectionError:
status = "Connection Refused"
print(status)
request_time = time.time() - start
info = "Connection Error in {0:.0f}ms".format(request_time)
print(info)
output_data.loc[i] = [df.iloc[i, 0], status]
i += 1
output_data.to_csv('dead_blocked2.csv', index=False)
print('CSV file created!')
Here's an example of one website that shows Connection Refused, even though it works: https://www.dytt8.net
I've tried using different TLS versions using the following piece of code and updating my session, but it still doesn't work:
class MyAdapter(HTTPAdapter):
def init_poolmanager(self, connections, maxsize, block=False):
self.poolmanager = PoolManager(num_pools=connections,
maxsize=maxsize,
block=block,
ssl_version=ssl.PROTOCOL_TLSv1)
Can anyone help?
Thanks!