0

I'm trying to get the status_code from various URLs in a csv file using the requests Python module. It works for some websites, but for most of them it shows 'Connection Refused', even though if I visit the websites through the browser they load just fine.

The code looks like this:

import pandas as pd 
import requests 
from requests.adapters import HTTPAdapter
from fake_useragent import UserAgent
import time
import urllib3

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

df = pd.read_csv('Websites.csv')
output_data = pd.DataFrame(columns=['url', 'status'])
number_urls = df.shape[0]

i = 0

for url in df['urls']:

    session = requests.Session()
    adapter = HTTPAdapter(max_retries=3)
    adapter.max_retries.respect_retry_after_header = False
    session.mount('http://', adapter)
    session.mount('https://', adapter)

    print(url)

    ua = UserAgent()
    header = {'User-Agent':str(ua.chrome)}
    
    try:
        # Status
        start = time.time()
        response = session.get(url, headers=header, verify=False, timeout=0.5)
        request_time = time.time() - start
        info = "Request completed in {0:.0f}ms".format(request_time)
        print(info)
        status = response.status_code
        if (status == 200):
            status = "Connection Successful"
        if (status == 404):
            status = "404 Error"
        if (status == 403):
            status = "403 Error"
        if (status == 503):
            status = "503 Error"
        print(status)

        output_data.loc[i] = [df.iloc[i, 0], status]

        i += 1

    except requests.exceptions.Timeout:
        status = "Connection Timed Out"
        print(status)
        request_time = time.time() - start
        info = "TimeOut in {0:.0f}ms".format(request_time)
        print(info)

        output_data.loc[i] = [df.iloc[i, 0], status]
        i += 1

    except requests.exceptions.ConnectionError:
        status = "Connection Refused"
        print(status)
        request_time = time.time() - start
        info = "Connection Error in {0:.0f}ms".format(request_time)
        print(info)

        output_data.loc[i] = [df.iloc[i, 0], status]
        i += 1

output_data.to_csv('dead_blocked2.csv', index=False)
print('CSV file created!')

Here's an example of one website that shows Connection Refused, even though it works: https://www.dytt8.net

I've tried using different TLS versions using the following piece of code and updating my session, but it still doesn't work:

class MyAdapter(HTTPAdapter):
def init_poolmanager(self, connections, maxsize, block=False):
    self.poolmanager = PoolManager(num_pools=connections,
                            maxsize=maxsize,
                            block=block,
                            ssl_version=ssl.PROTOCOL_TLSv1)

Can anyone help?

Thanks!

MTavares
  • 27
  • 7
  • 1
    I got "SSL_ERROR_UNSUPPORTED_VERSION" when I visited the website from my browser. seems like the website only support TLS 1.1? – Anunay Sep 17 '20 at 15:27
  • It might yes, but I still get Connection Refused using ssl_version=ssl.PROTOCOL_TLSv1_1 in poolmanager/session – MTavares Sep 17 '20 at 16:26
  • Oh seems like the website uses `DES-CBC3-SHA` try using this https://stackoverflow.com/questions/44141655/requests-failing-to-connect-to-a-tls-server – Anunay Sep 17 '20 at 16:38

0 Answers0