Tools: Ubuntu, Python, Selenium, Firefox
I am tying to automate the dowloading of image files from a subscription web site. I do not have access to the server other than through my paid subscription. To avoid having to click a button for each file download, I decided to automate it using Python, Selenium, and Firefox. (I have been using these three together for the first time for two days now. I also know very little about cookies.)
I am interested in downloading following three formats in order or preference: ['EPS', 'PNG', 'JPG']. A button for each format is available on the web site.
I have managed to have success in automating the downloading of the 'PNG' and 'JPG' files to disk by setting the Firefox preferences by hand as suggested in this post: python webcrawler downloading files
However, when the file is in an 'EPS' format, the "You have chosen to save" dialog box still pops open in the Firefox window.
As you can see from my code, I have set the preferences to save 'EPS' files to disk. (Again, 'JPG' and 'PNG' files are saved as expected.)
from selenium import webdriver
profile = webdriver.firefox.firefox_profile.FirefoxProfile()
profile.set_preference("browser.download.folderList", 1)
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference('browser.helperApps.neverAsk.saveToDisk',
'image/jpeg,image/png,application/postscript,'
'application/eps,application/x-eps,image/x-eps,'
'image/eps')
profile.set_preference("browser.helperApps.alwaysAsk.force", False)
profile.set_preference("plugin.disable_full_page_plugin_for_types",
"application/eps,application/x-eps,image/x-eps,"
"image/eps")
profile.set_preference(
"general.useragent.override",
"Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:26.0)"
" Gecko/20100101 Firefox/26.0")
driver = webdriver.Firefox(firefox_profile=profile)
#I then log in and begin automated clicking to download files. 'JPG' and 'PNG' files are
#saved to disk as expected. The 'EPS' files present a save dialog box in Firefox.
I tried installing an extension for Firefox called "download-statusbar" that claims to negate any save dialog box from appearing. The extension loads in the Selenium Firefox browser, but it doesn't function. (A lot of reviews say the extension is broken despite the developers' insistence that it does function.) It isn't working for me anyway so I gave up on it.
I added this to the Firefox profile in that attempt:
#The extension loads, but it doesn't function.
download_statusbar = '/home/$USER/Downloads/'
'/download_statusbar_fixed-1.2.00-fx.xpi'
profile.add_extension(download_statusbar)
From reading other stackoverflow.com posts, I decided to see if I could download the file via the url with urllib2. As I understand how this would work, I would need to add cookies to the headers in order to authenticate the downloading of the 'EPS' file via a url.
I am unfamiliar with this technique, but here is the code I tried to use to download the file directly. It failed with a '403 Forbidden' response despite my attemps to set cookies in the urllib2 opener.
import urllib2
import cookielib
import logging
import sys
cookie_jar = cookielib.LWPCookieJar()
handlers = [
urllib2.HTTPHandler(),
urllib2.HTTPSHandler(),
]
[h.set_http_debuglevel(1) for h in handlers]
handlers.append(urllib2.HTTPCookieProcessor(cookie_jar))
#using selenium driver cookies, returns a list of dictionaries
cookies = driver.get_cookies()
opener = urllib2.build_opener(*handlers)
opener.addheaders = [(
'User-agent',
'Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:26.0) '
'Gecko/20100101 Firefox/26.0'
)]
logger = logging.getLogger("cookielib")
logger.addHandler(logging.StreamHandler(sys.stdout))
logger.setLevel(logging.DEBUG)
for item in cookies:
opener.addheaders.append(('Cookie', '{}={}'.format(
item['name'], item['value']
)))
logger.info('{}={}'.format(item['name'], item['value']))
response = opener.open('http://path/to/file.eps')
#Fails with a 403 Forbidden response
Any thoughts or suggestions? Am I missing something easy or do I need to give up hope on an automated download of the EPS files? Thanks in advance.