10

I'm using Selenium with my CI system to automatically test my various applications, one of which is a web form with a downloadable copy of our answers (as a dynamically generated PDF). The test is to assert the downloadable PDF file contains the correct answers (answers given on the webform). My problem is trying to handle the download dialog to retrieve the PDF file (asserting the contents of the PDF are correct is outside the scope of this question).

I've spent a while looking around on ways to handle it, the few things I've found of any relevance was AutoIT, changing the default downloads location & making the browser download files automatically, and just asserting the link works without downloading the file. Unfortunately, my situation rules out all three possibilities.

  1. I am using a variety of browsers (ruling out the automatic downloads as some browsers do not support this).
  2. I am using a variety of platforms (ruling out AutoIT, a Windows only application).
  3. The content within the PDF is dynamically generated based on previous interactions with the application, the test is to assert whether the content generated matches the expected values, so just checking if the link exists is not enough.

Because the download dialog being presented is managed by the OS, I'm not sure whether it is possible to use Selenium for what I intend, however I thought I would ask first to see if anyone does know any solutions using Selenium, or instead can recommend some other acceptable means of testing?

Tro
  • 897
  • 9
  • 32
  • If luksch's answer is not enough for you, I have another idea. If you can sniff and post the Http/Session parameters you would have by a normal-interaction download? You might be able to check the link without downloading if you construct the packets and manage your session (i.e, cookies/cache - wherever the previous interaction is saved) right. – A. Abramov Aug 13 '15 at 05:35
  • How about opening the PDF in browser itself? – Manu Aug 13 '15 at 12:26

2 Answers2

6

As far as I know you can't use selenium for that, for the reasons you stated yourself. However, I think the best way to approach this is to download the generated pdf directly without selenium. Since you know its url, you can maybe use the approach outlined in this article. It describes the use of "Powder-Monkey" to do exactly what you want to do.

luksch
  • 11,497
  • 6
  • 38
  • 53
3

This is an annoying issue indeed. However, I could figure out how to solve it for Firefox. Maybe you can find a similar solution for other browsers.

Basically, you have to force the browser to download the file without asking for it. You can do that by loading a specially crafted profile.

from selenium import webdriver

myprofile = webdriver.FirefoxProfile('./profile')
myprofile.set_preference('browser.download.dir', '/tmp/my_downloads_folder')
myprofile.set_preference('browser.download.folderList', 2)
myprofile.set_preference('pdfjs.migrationVersion', 1);

browser = webdriver.Firefox(fp)

Besides loading the profile, we also define a downloads folder and disable the pdfjs plugin.

In ./profile folder, we have a mimeTypes.rdf file like this:

<?xml version="1.0"?>
<RDF:RDF xmlns:NC="http://home.netscape.com/NC-rdf#"
         xmlns:RDF="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <RDF:Description RDF:about="urn:mimetype:application/pdf"
                   NC:value="application/pdf"
                   NC:editable="true">
    <NC:handlerProp RDF:resource="urn:mimetype:handler:application/pdf"/>
  </RDF:Description>
  <RDF:Description RDF:about="urn:mimetype:handler:application/pdf"
                   NC:alwaysAsk="false"
                   NC:saveToDisk="true"
                   NC:handleInternal="false">
    <NC:externalApplication RDF:resource="urn:mimetype:externalApplication:application/pdf"/>
   </RDF:Description>
</RDF:RDF>

I hope it helps you.

Thiago Curvelo
  • 3,711
  • 1
  • 22
  • 38