I can use the poppler library to easily create an image from a pdf using:
pdftoppm -png myfile.pdf > myfile.png
I'm now trying to use the python-poppler library to do the same from within Python. After installing the lib (sudo apt-get install python-poppler
) I can load in a pdf file using the following:
doc = poppler.document_new_from_file('file://'+urllib(inputF), password=None)
but I now want to load a pdf file from binary. I thought I could use the method poppler.document_new_from_data()
, so I tried the following, which returns a type error:
>>> d = poppler.document_new_from_data(userDoc.binary)
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: document_new_from_data() argument 1 must be string without null bytes, not Binary
I'm unsure what this means though. What "data" can be a "string without null bytes, not Binary"? I tried checking out the source of the method, but the source files (here) don't even contain a single .py
file.
I tried converting the binary to base64, but that leads to an error saying TypeError: Required argument 'length' (pos 2) not found
.
Any help would be welcome!
[EDIT] Thanks to the tip of @Vaulstein I now got a bit further:
s = binascii.a2b_base64(userDoc.binary)
r = poppler.document_new_from_data(s, len(s), password='')Syntax Warning: May not be a PDF file (continuing anyway)
Syntax Error (3): Illegal character <75> in hex string
Syntax Error (4): Illegal character <df> in hex string
Syntax Error (5): Illegal character <5d> in hex string
Syntax Error (6): Illegal character <28> in hex string
Syntax Error (7): Illegal character <6e> in hex string
Syntax Error (8): Illegal character <3f> in hex string
Syntax Error (9): Illegal character <ca> in hex string
Syntax Error (10): Illegal character <89> in hex string
Syntax Error (11): Illegal character <db> in hex string
>>> r = poppler.document_new_from_data(s, len(s), password='')
Traceback (most recent call last):
File "<input>", line 1, in <module>
GError: PDF document is damaged
But it still doesn't seem to be the correct encoding. Any other idea how I can do this?