I am using Firefox GeckoDriver with Selenium to download a few files for the client on a website.
The setup runs on Digital Ocean with Docker.
Here is the flow, so whenever a user calls an API a new browser instance is created and logs in to the website with the user's login id and password and then downloads a bunch of files creates a zip and sends back.
Everything seems to work fine when there is only one request which means only one instance of the browser on the server, but when there are multiple requests which mean multiple browser instances on the server some of them breaking giving the error message "browsing context has been discarded".
This happens either during the crawling part or just after instance creation.
There is no particular pattern of this error, it happens randomly and breaks the browser instance. I've gone through probably all the questions and GitHub issues on this topic but some of them are too old workarounds which do not work in the current version and some does not work at all, to begin with.
Here are the browser version and configurations I am running on Jenkins.
`
{'browserName': 'firefox',
'marionette': True,
'acceptInsecureCerts': True,
'moz:firefoxOptions': {
'prefs': {
'browser.download.folderList': 2,
'browser.download.dir': '/home/usr/usr/project/static/785fg7',
'browser.download.useDownloadDir': True,
'pdfjs.disabled': True,
'browser.helperApps.neverAsk.saveToDisk':
'application/vnd.openxmlformats-
officedocument.spreadsheetml.sheet,
application/pdf,
application/csv,application/excel,
application/vnd.msexcel,
application/vnd.ms-excel,text/anytext,
text/comma-separated-values,
text/csv,application/vnd.ms-excel,
application/octet-stream,
image/tiff'},
'args': ['-headless',
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--window-size=1920,1080',
'--start-maximized']}} `
Note that everything works fine in local no matter how many instances of browser I create. Problem is only when it is deployed on the server. In my local system, everything works fine with headless mode with any number of requests.
Here is the Python code to initiate the browser.
def get_firefox_driver_for_linux_server(apply_proxy, uuid_user, download_options=False):
firefox_options = Options()
firefox_options.set_headless()
if download_options:
if not os.path.exists(constants.DOWNLOADS_PATH):
os.mkdir(constants.DOWNLOADS_PATH)
download_path = os.path.join(constants.DOWNLOADS_PATH, uuid_user)
firefox_options.set_preference("browser.download.folderList", 2)
firefox_options.set_preference("browser.download.dir", download_path)
firefox_options.set_preference("browser.download.useDownloadDir", True)
firefox_options.set_preference("pdfjs.disabled", True)
firefox_options.set_preference("browser.helperApps.neverAsk.saveToDisk",
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,"
"application/pdf,"
"application/csv,"
"application/excel,"
"application/vnd.msexcel,"
"application/vnd.ms-excel,"
"text/anytext,"
"text/comma-separated-values,"
"text/csv,"
"application/vnd.ms-excel,"
"application/octet-stream,"
"image/tiff")
firefox_options.add_argument("--no-sandbox")
firefox_options.add_argument("--disable-setuid-sandbox")
firefox_options.add_argument('--disable-dev-shm-usage')
firefox_options.add_argument("--window-size=1920,1080")
firefox_options.add_argument("--start-maximized")
if not os.path.exists(constants.LOG_PATH):
os.mkdir(constants.LOG_PATH)
import random as r
global random_id
random_id = str(r.randint(1, 99999))
logging.warning("random id...{}".format(random_id))
with open(os.path.join(constants.LOG_PATH, random_id + '.log'), 'w+') as lf:
pass
gecko_driver_path = "/usr/local/bin/geckodriver"
if apply_proxy:
proxy = "proxy:24000"
firefox_capabilities = webdriver.DesiredCapabilities.FIREFOX
firefox_capabilities['marionette'] = True
firefox_capabilities['proxy'] = {
"proxyType": "MANUAL",
"httpProxy": proxy,
"ftpProxy": proxy,
"sslProxy": proxy
}
driver = webdriver.Firefox(executable_path=gecko_driver_path, firefox_options=firefox_options,
capabilities=firefox_capabilities,
log_path=os.path.join(constants.LOG_PATH, random_id + '.log'))
check_gecko_version(driver, firefox_options)
return driver
else:
logging.info("No proxy applied")
driver = webdriver.Firefox(executable_path=gecko_driver_path, firefox_options=firefox_options)
check_gecko_version(driver, firefox_options)
return driver