2

I have a webcrawler that searches for certain files and downloads them, but how do I download a pdf file when the "save as or open" dialog prompts up. I am currently using python selenium to crawl. Here is my code.

from selenium import webdriver
import time

browser = webdriver.Firefox() # Get local session of firefox
browser.get("http://www.tda-sgft.com/TdaWeb/jsp/fondos/Fondos.tda") # Load page
link = browser.find_element_by_link_text("Mortgage Loan")
link.click()
link2 = browser.find_element_by_link_text("ABS")
link2.click()
link3 = browser.find_element_by_link_text("TDA 13 Mixto")
link3.click()
download = browser.find_element_by_link_text("General Fund Information")
download.click()

time.sleep(0.2) # Let the page load, will be added to the API
browser.close()

1 Answers1

4

You are going to need to modify the preferences of your Firefox profile. In order to get it to stop showing that dialog, you need to set the browser.helperApps.neverAsk.saveToDisk property of the profile in use. To do so, you could do this (note that this is for CSVs/Excel files - I believe your type would be 'application/pdf'):

profile = webdriver.firefox.firefox_profile.FirefoxProfile()
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', ('text/csv,'
                                                                  'application/csv,'
                                                                  'application/msexcel'))

For your case (I haven't tested this with a PDF, so take it with a grain of salt :) ), you could try this:

profile = webdriver.firefox.firefox_profile.FirefoxProfile()
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', ('application/pdf'))

The second argument is a tuple that contains the types of files that will never trigger a Save As prompt. You then pass this profile into your browser:

browser = webdriver.Firefox(firefox_profile=profile)

Now when you download a file of a type in that tuple, it should bypass the prompt and put it in your default directory. If you want to change the directory to which the file downloads, you can use the same process, just changing a few things (do this before attaching the profile to the browser):

profile.set_preference('browser.download.folderList': 2)
profile.set_preference('browser.download.dir': '/path/to/your/dir')
RocketDonkey
  • 36,383
  • 7
  • 80
  • 84
  • And can I customize what directory to save it in? –  Aug 23 '12 at 20:29
  • @user1582983 Yeah, so switch out the values inside of that tuple with 'application/pdf' (I think - I'll update with something you can copy/paste). As for the directory, see my last update. Hope that helps! – RocketDonkey Aug 23 '12 at 20:30
  • For some reason it is not downloading to the directory, and when I check the Download from firefox, it is not listed in there either. –  Aug 23 '12 at 20:42
  • @user1582983 Does it download at all? It could be an issue with how your path is specified. Happy to troubleshoot. – RocketDonkey Aug 23 '12 at 20:44
  • My path is currently, `C:\Documents and Settings\User\Desktop\web-get project\downloads` –  Aug 23 '12 at 21:10
  • @user1582983 Did the file download at all the first time? Mine saves to `C:\Users\\Downloads` by default. – RocketDonkey Aug 23 '12 at 21:12
  • Yes, mine too. It does download to the downloads folder. –  Aug 23 '12 at 21:18
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/15738/discussion-between-rocketdonkey-and-user1582983) – RocketDonkey Aug 23 '12 at 21:19
  • Ok a got it, instead of one `\` you do two `\\`, example `c:\\path\\to\\file` –  Aug 24 '12 at 16:04
  • @user1582983 Awesome! Does it work with single front-slashes as well? Someone here showed me that trick recently (I didn't know that you could do that with Windows paths). In any case, glad it works! – RocketDonkey Aug 24 '12 at 16:09
  • Another problem that I have is that I cant download csv files now, the `application/pdf` worked with pdf but it is not working with csv, i checked in the firefox-options-applications tab and csv is not listed there. –  Aug 24 '12 at 16:19
  • @RocketDonkey How do I download winzip/winrar files? Archive Manager is the MIME type. – cppcoder Nov 29 '12 at 01:38
  • @cppcoder You can try `application/x-rar-compressed` for `RAR` files and `application/zip` for `ZIP`. I just came across this list (http://en.wikipedia.org/wiki/Internet_media_type) that may be handy (I'll be bookmarking it at least :) ). Also, http://filext.com/file-extension/ZIP has some additional information regarding `ZIP` MIME types, so one of those could work potentially. Definitely let me know which (if any) work - curious to know and happy to troubleshoot if none of those are right. – RocketDonkey Nov 29 '12 at 01:55
  • @RocketDonkey I tried with `application/zip` but it did not work. Is there any way we can check what is really happening? Download did not happen. – cppcoder Nov 29 '12 at 04:21
  • @cppcoder Hmm, did you try any of the other MIME types on that second link? Also, this answer (http://stackoverflow.com/questions/6977544/rar-zip-files-mime-type) looks like it may have some possibilities (namely `application/octet-stream`). – RocketDonkey Nov 29 '12 at 04:49