Why does socket interfere with selenium?

Question

I wrote a python script to check for an internet connection using socket (Checking network connection), then scrape html from yahoo finance using selenium.

Very frequently (but not always), it gives a ReadTimeoutError (see below)

I can get it to work by checking for an internet connection using http.client instead (see below), but I still want to know why socket interferes with selenium.


def internet(host="8.8.8.8", port=443, timeout=1):
    try:
        socket.setdefaulttimeout(timeout)
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        s.connect((host, port))
        s.shutdown(socket.SHUT_RDWR)
        s.close()
        return True
    except OSError:  
        s.close()
        return False

#  Wait for internet to be available

i = 1
while internet() is False:
    time.sleep(1)
    if i == 300:  # quit if no connection for 5 min (300 seconds)
        print('\nIt has been 5 minutes. Aborting attempt.\n')
        sys.exit(0)
    i += 1

# Get html from yahoo page

symb = 'AAPL'
url = 'http://finance.yahoo.com/quote/{}/history'.format(symb)

chop = webdriver.ChromeOptions()
chop.add_argument('--user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Firefox/68.0"')
driver = webdriver.Chrome('/Users/fake_user/Dropbox/Python/chromedriver', chrome_options=chop)
driver.get(url)
html_source = driver.page_source
driver.quit()

It throws this error:

urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='127.0.0.1', port=58956): Read timed out. (read timeout=<object object at 0x103af7140>)

I can change the internet function as a workaround, but I can't figure out why socket interferes with selenium:

import http.client as httplib

def internet():
    conn = httplib.HTTPConnection("www.google.com", timeout=5)
    try:
        conn.request("HEAD", "/")
        conn.close()
        return True
    except:
        conn.close()
        return False

Same problem here. I think the problem is that socket.setdefaulttimeout() is a global socket setting. — Frode Akselsen, Dec 12 '19 at 21:00

score 1 · Accepted Answer · edited May 26 '20 at 02:34

From the documentation:

socket.setdefaulttimeout(timeout)

Set the default timeout in seconds (float) for new socket objects. When the socket module is first imported, the default is None. See settimeout() for possible values and their respective meanings.

The problem is that setdefaulttimeout sets the timeout for all newly created sockets, therefore also for Selenium. It is a global socket library setting.

If you want to use a timeout for this socket instance only, use socket.settimeout(value) (doc).

def internet(host="8.8.8.8", port=443, timeout=1):
try:
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.settimeout(timeout) # correction from s.timeout(timeout)
    s.connect((host, port))
    s.shutdown(socket.SHUT_RDWR)
    s.close()
    return True
except OSError:  
    s.close()
    return False

Why does socket interfere with selenium?

1 Answers1

socket.setdefaulttimeout(timeout)

Linked