I am using selenium headless browser with chromedriver to crawl content from websites that have javascript generated code. (thus why I can't use requests)
The code runs for a good couple of hours as the amount of pages to be crawled is large, and even though every time I make use of the webbrowser object it is encompassed in a try/except/finally statement like:
browser = webdriver.Chrome(chrome_options=chrome_options, executable_path=chrome_driver)
t=eventlet.Timeout(15)
try:
browser.get(url)
soup = BeautifulSoup(browser.page_source,"lxml")
row = soup.find("div",{"id":"row0"})
except:
pass
finally:
t.cancel()
browser.quit()
there are multiple chrome processes running for hours when I check top
or ps
.
What is the most correct way to effectively instantiate, use and free the memory used by selenium webdriver?