2

I am trying to download a PDF (http://judis.nic.in/supremecourt/imgs1.aspx?filename=43215) using Selenium Webdriver for Chrome.

The Download button appears on top of the page in a dynamic ribbon, which appears when hovered over.

This is what the PDF looks like with the ribbon when the mouse has hovered over it.

PDF without the hover-over ribbon

I intend to click on this Download button (the downward arrow sign next to the Print symbol) through my Python script.

Thank you in advance.

Rachayita Giri
  • 487
  • 5
  • 17

2 Answers2

1

You don't need selenium to download it. You could use urllib2

import urllib2

def download_file(pdf_url):
    response = urllib2.urlopen(pdf_url)
    file = open("doc.pdf", 'w')
    file.write(response.read())
    file.close()

def main():
    download_file("http://judis.nic.in/supremecourt/imgs1.aspx?filename=43215")

if __name__ == "__main__":
    main()
Ryan
  • 2,167
  • 2
  • 28
  • 33
  • Thanks. This works. Is there also any way by which I can write the response in a text file (.txt) instead of a PDF (.pdf)? – Rachayita Giri Nov 21 '16 at 13:38
  • It it solves your problem, please mark it as the answer. Unfortunately this won't convert it to a text file. You could use a solution like this: http://stackoverflow.com/questions/25665/python-module-for-converting-pdf-to-text – Ryan Nov 21 '16 at 14:16
  • what if the link is protected with login and password? – StackUP Aug 31 '17 at 00:48
  • @StackUP: https://stackoverflow.com/questions/35376005/download-a-file-from-https-with-authentication Or you could use selenium to login and then access the file. – Ryan Aug 31 '17 at 15:58
  • @Ryan: PDF is available only in active session. everytime you click on the link to generate PDF. it contains newly generated token. So old link can't be used again. Selenium is unable to download becuase it is not a direct link to PDF. – StackUP Aug 31 '17 at 18:27
  • https://stackoverflow.com/questions/45972117/how-to-download-pdf-with-python-or-selenium This is what I have been trying with no success. – StackUP Aug 31 '17 at 18:28
0

The PDF basically opens in an built-in extension of the browser through which you can view the PDF file. The HTML of it is basically inaccessible to Selenium since the HTML resides in the extension.

You can download the PDF simple using requests library.

import requests

url = "http://judis.nic.in/supremecourt/imgs1.aspx?filename=43215"
r = requests.get(url, stream=True)

with open("THE.pdf", "wb") as fd:
    for ch in r.iter_content():
        fd.write(ch)
JRodDynamite
  • 12,325
  • 5
  • 43
  • 63