0

My scripts run on Python 3.6, Selenium 2.48 and Firefox 41 (can't upgrade, I'm on a company)

I want to download some XML files from a website using Python and Selenium Webdriver. I use a Firefox profile to avoid the dialog frame and save the file in a specific location.

profile = webdriver.firefox.firefox_profile.FirefoxProfile()
profile.set_preference("browser.download.folderList", 2)
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference("browser.download.panel.shown", False)
profile.set_preference("browser.download.dir", dloadPath)
profile.set_preference("browser.helperApps.neverAsk.openFile","application/xml,text/xml")
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/xml,text/xml")
browser = webdriver.Firefox(firefox_profile=profile)

The program finds all links downloadable (tested : works)

links = []
elements = browser.find_elements_by_xpath("//a[contains(@href,'reception/')]")
for elem in elements:
    href = elem.get_attribute("href")
    links.append(href)
return links

To download the file I use get() from Selenium

browser.get(fileUrl)

The files I'm looking for have a very specific url, means that I can't use Requests or urllib (2 or 3) and I need to login to the website and navigate througth it, can do It with those modules.

The url is like :

https://www.example.com/cft/cft/reception/filename.xml?user=xxxxxxxx&password=xxxxxxxx

Here is the html link :

<a href="reception/filename.xml?user=xxxxxxxx&password=xxxxxxxx" onclick="alert('Faire un clic droit, puis enregistrer la cible sous...');return false">filename.xml</a>

With my script I can access to the website, navigate throught it but when I get the file url the dialog frame pops up, with no reasons that I found.

The script works very well on other websites, I think the problem is the url.

Thanks for your help

  • Why can't you use `requests` once you have collected the URLs? – Paco H. Jul 11 '17 at 08:27
  • @PacoH. I tried with this answer : https://stackoverflow.com/questions/16694907/how-to-download-large-file-in-python-with-requests-py/16696317#16696317 , but the file created is empty. – Antoine S. Jul 11 '17 at 08:30
  • It could be possible you had a bug in your code? I would say downloading through `requests` would be better and easier than to do it through `selenium`. If you manually go to one of this `/filename.xml?user...` urls in your browser does it download the file? Do you need to login somewhere first (in addition to the username and password in the query string)? – Paco H. Jul 11 '17 at 08:36
  • If a bug exists I really don't know where :/ I agree with you, Requests must be easier and error-less than Selenium. When I go manually to the link it shows the "save dialog" to download the file. And yes it's a professional portal so I need to log in (is why I use webdriver in a first place) – Antoine S. Jul 11 '17 at 08:39
  • Logins can usually also be done with requests. If you do the login manually and look at the request your browser sends in the browser's developer tools, you can see the details you need to replicate that request from `requests`. A login normally is saved in a cookie, which you have to carry around to _stay logged in_. You can do this easily with a [`requests.Session`](http://docs.python-requests.org/en/master/user/advanced/#session-objects). – Paco H. Jul 11 '17 at 08:53
  • Thanks for the tip, I will try this. – Antoine S. Jul 11 '17 at 08:56
  • @PacoH. I succeed to use 'requests.Session' to login to the website, but I have again issues trying to downloading the files. I wil ask a new question. Thanks for your help ! 'Requests' is really easier to manipulate – Antoine S. Jul 12 '17 at 15:22

0 Answers0