Python3: Download PDF to memory and convert first page to image

Question

i try to do the following:

Download a PDF file to memory
Convert the first page to an image
Use that image with tweepy

I tried the following code, but run into an error.

from PIL import Image
from pdf2image import convert_from_path
from urllib.request import urlopen
from io import StringIO, BytesIO

url = 'http://somedomain.com/assets/applets/internet.pdf'
scrape = urlopen(url) # for external files
pdfFile = BytesIO(scrape.read())
pdfFile.seek(0)
pages = convert_from_path(pdfFile,last_page=1, dpi=100)

for page in pages:
    page.save('/home/out.jpg', 'JPEG')

Here is the error:

TypeError: Can't convert '_io.BytesIO' object to str implicitly

The generated image should later be used to upload it to twitter by tweepy. I don't need to store it to disk, that's why i try to do all in memory. Anybody who could help me please?

You need to use `convert_from_bytes` method instead of `convert_from_path` — kip, Jun 08 '18 at 15:44
@kip I changed the code to pages = convert_from_bytes(pdfFile,last_page=1, dpi=100) Also imported this function, but still get an error: TypeError: a bytes-like object is required, not '_io.BytesIO' — Lionking, Jun 08 '18 at 21:54
try pass to above method a `bytes` from the `BytesIO` object, maybe with `getvalue()`, something like: `convert_from_bytes(pdfFile.getvalue()`, but I think that `scrape.read()` return a `bytes` object.... `convert_from_bytes(scrape.read()` — kip, Jun 09 '18 at 03:29
To use scrape.read() directly did the trick. Thank you a lot. — Lionking, Jun 09 '18 at 09:25

Python3: Download PDF to memory and convert first page to image

0 Answers0

Linked