I use Selenium Chrome to extract information from online sources. Baically, I loop over a list of URLs (stored in mylinks
) and load the webpages in the browser as follows:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("window-size=1200,800")
browser = webdriver.Chrome(chrome_options=options)
browser.implicitly_wait(30)
for x in mylinks:
try:
browser.get(x)
soup = BeautifulSoup(browser.page_source, "html.parser")
city = soup.find("div", {"class": "city"}).text
except:
continue
My problem is, that the browser "freezes" at some point. I know that this problem is caused by the webpage. As a consequence, my routine stops since the browser does not work any more. Also browser.implicitly_wait(30)
does not help here. Neither explicit or implicit wait solves the problem.
I want to "timeout" the problem, meaning that I want to quit()
the browser after x
seconds (in case the browser freezes) and restart it.
I know that I could use a subprocess
with timeout
like:
def startprocess(filepath, waitingtime):
p = subprocess.Popen("C://mypath//" + filepath)
try:
p.wait(waitingtime)
except subprocess.TimeoutExpired:
p.kill()
However, for my task this solution would be second-best.
Question: is there an alternative way to timeout the browser.get(x)
step in the loop above (in case the browser freezes) and to continue to the next step?