I'm using python and the remote web driver in selenium to try to generate and download reports (xml files) from a Google Mini. I'm generating the files just fine, and am able to select the Export link. But is there an easy way to then instruct the remote webdriver to download that link to a file?
Asked
Active
Viewed 3,102 times
0
-
Why not `urllib.urlretrieve()`? – jfs Aug 21 '12 at 21:57
-
Because while I can select the element I'd click on to export the file, if I urlretrieve the url, I actually get an html page instead of the xml page I'm expecting. But when selenium clicks on the element, the webdriver server reports that it got an xml page. – Kyle Schmidt Aug 21 '12 at 22:25
-
if you've already downloaded it then just save the page to local file: `with open(filename, "wb") as file: file.write(driver.page_source)` – jfs Aug 21 '12 at 22:38
-
Unfortunately, if I try to do a driver.get(url), it'll throw an error: 15:41:20.275 WARN - Failed parsing XML document https://search.example.com:8443/EnterpriseController?actionType=exportSummaryReport&reportName=test_found_we_081812&collection=default_collection: Element type "topQuery" must be followed by either attribute specifications, ">" or "/>". So I think it's trying to parse the XML file, which I don't need - Google Mini's sometimes kick back XML like "
223 – Kyle Schmidt Aug 21 '12 at 22:45 -
Have you tried to set `Accept` header to get XML instead of HTML with urllib2? You could use a network sniffer such as wireshark to find out how the requests using urllib2 and webdriver differ. – jfs Aug 21 '12 at 23:08
-
Setting Accept to either text/xml or application/xml didn't appear to make a difference - in both cases nothing at all was in the response to urllib2.urlopen. I'll have to try Wireshark in the morning. – Kyle Schmidt Aug 21 '12 at 23:18
-
Can't you configure the browser to automatically download the file somewhere you can work on it? Can you specify your OS/Browser configuration? – ghm1014 Aug 22 '12 at 22:42
-
You can if you're using the Firefox/Chrome webdriver, but I'm using the remote webdriver. I'm not sure if you can configure that similarly, and I haven't found any documentation saying you can. – Kyle Schmidt Aug 22 '12 at 22:53
1 Answers
0
Well, I did not find a way to make Chrome not display xml as a page instead of download as a file. Seems to depend on the page design per How to download an XML without the browser opening it in another tab.
However, we can set preferences for remote web drivers. Remote() includes a desired_capabilities argument where preferences related to file downloads can be passed: options.to_capabilities()
from selenium import webdriver
options = webdriver.ChromeOptions()
prefs = {'profile.default_content_settings.popups': 0,
'download.default_directory': download_path}
options.add_experimental_option('prefs', prefs)
pprint(options.to_capabilities())
driver = webdriver.Remote(command_executor='http://camutil_selenium_1:4444/wd/hub',
desired_capabilities=options.to_capabilities())
The pprint output from above:
{'browserName': 'chrome',
'chromeOptions': {'args': [],
'extensions': [],
'prefs': {'download.default_directory': '/some/path/',
'profile.default_content_settings.popups': 0}},
'javascriptEnabled': True,
'platform': 'ANY',
'version': ''}
Now you can start a file download using driver.get(dl_url) for files that have extensions not displayed by Chrome, and the file will save to download_path. Note: This will just start the download and you may need to add logic to wait for the download to finish.
-
This is not working for me. A check on the existence of the file gives an error. Without using a *remote* web driver it works fine... – Alex Nov 03 '17 at 10:48