How to display a pdf that has been downloaded in python

Question

I have grabbed a pdf from the web using for example

import requests
pdf = requests.get("http://www.scala-lang.org/docu/files/ScalaByExample.pdf")

I would like to modify this code to display it

from gi.repository import Poppler, Gtk

def draw(widget, surface):
    page.render(surface)

document = Poppler.Document.new_from_file("file:///home/me/some.pdf", None)
page = document.get_page(0)

window = Gtk.Window(title="Hello World")
window.connect("delete-event", Gtk.main_quit)
window.connect("draw", draw)
window.set_app_paintable(True)

window.show_all()
Gtk.main()

How do I modify the document = line to use the variable pdf that contains the pdf?

(I don't mind using popplerqt4 or anything else if that makes it easier.)

You should be using `Poppler.Document.new_from_data`, however there is a conversion problem between `str` and `char *` due to the way `str` is expected to carry Unicode data, but `char *` expects raw binary data. Up to now, I couldn't make it work. — Cilyan, Feb 10 '14 at 21:00
@Cilyan Good Idea ! I used `len(repr(content))` for length field and `str(content)` for the data field. It worked for me. — Raghav RV, Feb 19 '14 at 18:41

score 5 · Answer 1 · edited Nov 28 '17 at 16:10

5

It all depends on the OS your using. These might usually help:

import os
os.system('my_pdf.pdf')

or

os.startfile('path_to_pdf.pdf')

or

import webbrowser
webbrowser.open(r'file:///my_pdf.pdf')

edited Nov 28 '17 at 16:10

jcoppens

5,306
6
27
47

answered Oct 16 '16 at 06:11

Beatriz Kanzki

51
1
2

score 1 · Answer 2 · answered Feb 13 '14 at 14:00

1

How about using a temporary file?

import tempfile
import urllib
import urlparse

import requests

from gi.repository import Poppler, Gtk

pdf = requests.get("http://www.scala-lang.org/docu/files/ScalaByExample.pdf")

with tempfile.NamedTemporaryFile() as pdf_contents:
    pdf_contents.file.write(pdf)
    file_url = urlparse.urljoin(
        'file:', urllib.pathname2url(pdf_contents.name))
    document = Poppler.Document.new_from_file(file_url, None)

answered Feb 13 '14 at 14:00

logc

3,813
1
18
29

This is my current workaround. It would be great if it could be avoided however. – marshall Feb 13 '14 at 14:07
Are you using python-poppler-qt4, pypoppler, or which library is the one that defines `Document.Poppler` ? – logc Feb 13 '14 at 14:18
My import line is from gi.repository import Poppler, Gtk which defines Poppler.Document . I needed to install libpoppler-dev to get it to work I think. I am happy to move to python-poppler-qt if that is a good idea however. – marshall Feb 13 '14 at 14:27
And which library is, in turn, the one that allows you to `import gi.repository` ? :) BTW, I am not suggesting you move to another library, I do not have very much experience with the others I mentioned ... – logc Feb 13 '14 at 14:31

score 1 · Answer 3 · answered Feb 19 '14 at 18:34

1

Try this and tell me if it works:

document = Poppler.Document.new_from_data(str(pdf.content),len(repr(pdf.content)),None)

answered Feb 19 '14 at 18:34

Raghav RV

3,938
2
22
27

I still get `PDF document is damaged` with this solution with python3.3, and a segmentation fault on python2.7. But maybe it will work for OP... – Cilyan Feb 19 '14 at 20:54
I tried it in ipython notebook. It did. but since @Cilyan says it did not work for him. You should try it yourself and tell me if it does work for you. – Raghav RV Feb 20 '14 at 21:57

score 1 · Answer 4 · answered Feb 19 '14 at 19:10

1

If you want to open pdf using acrobat reader then below code should work

import subprocess
process = subprocess.Popen(['<here path to acrobat.exe>', '/A', 'page=1', '<here path to pdf>'], shell=False, stdout=subprocess.PIPE)
process.wait()

answered Feb 19 '14 at 19:10

naren

14,611
5
38
45

score 1 · Answer 5 · answered Mar 01 '14 at 07:04

1

Since there is a library named pyPdf, you should be able to load PDF file using that. If you have any further questions, send me messege.

answered Mar 01 '14 at 07:04

lonelyjohner

87
1
6

Dysmas · Answer 6 · 2015-10-14T17:59:36.503

August 2015 : On a fresh intallation in Windows 7, the problem is still the same :

Poppler.Document.new_from_data(data, len(data), None)

returns : Type error: must be strings not bytes.

Poppler.Document.new_from_data(str(data), len(data), None)

returns : PDF document is damaged (4).

I have been unable to use this function.

I tried to use a NamedTemporayFile instead of a file on disk, but for un unknown reason, it returns an unknown error.
So I am using a temporary file. Not the prettiest way, but it works.

Here is the test code for Python 3.4, if anyone has an idea :

from gi.repository import Poppler
import tempfile, urllib
from urllib.parse import urlparse
from urllib.request import urljoin

testfile = "d:/Mes Documents/en cours/PdfBooklet3/tempfiles/preview.pdf"
document = Poppler.Document.new_from_file("file:///" + testfile, None)          # Works fine
page = document.get_page(0)
print(page)         # OK

f1 = open(testfile, "rb")
data1 = f1.read()
f1.close()

data2 = "".join(map(chr, data1))  # converts bytes to string
print(len(data1))
document = Poppler.Document.new_from_data(data2, len(data2),  None)
page = document.get_page(0)                                                     # returns None
print(page)

pdftempfile = tempfile.NamedTemporaryFile()
pdftempfile.write(data1)

file_url = urllib.parse.urljoin('file:', urllib.request.pathname2url(pdftempfile.name))
print( file_url)
pdftempfile.seek(0)
document = Poppler.Document.new_from_file(file_url, None)                       # unknown error

Don't cast the bytes to a string, rather decode them. – Shmack Aug 13 '22 at 18:47 — Shmack, Aug 13 '22 at 18:47

How to display a pdf that has been downloaded in python

6 Answers6