How to create a PDF of images stored in Google Cloud Storage?

Question

I am sorry if this is a stupid question. I am very new to GCP.

For a web app, I need to create a PDF from images stored in Cloud Storage.

First, I tried to use a python package fpdf with files which are stored in Cloud Storage and see if this is possible. Because images are stored online, I am using urllib2to fetch images.

Code:

from fpdf import FPDF
import urllib2
import os

imagelist = ["https://storage.googleapis.com/seventh-terrain-179700.appspot.com/excuses.jpg", "https://storage.googleapis.com/seventh-terrain-179700.appspot.com/excuses2.jpg"]

pdf = FPDF()
i = 0
for image in imagelist:
    image = urllib2.urlopen(image)

    # writing image files in current folder
    with open('image'+str(i)+'.jpg','wb') as output:
        output.write(image.read())

    pdf.add_page()
    pdf.image('image'+str(i)+'.jpg', 10, 10, 100, 100) # pdf.image(image,x,y,w,h)

    # removing images
    os.remove('image'+str(i)+'.jpg')
    i += 1

# Creating PDF in current folder
pdf.output("yourfile.pdf", "F")

This words fine.

Then I tried deploying same code in local server:

import webapp2
from fpdf import FPDF
import urllib2
import os

pdf = FPDF()

class MainPage(webapp2.RequestHandler):
    def get(self):
        imagelist = ["https://storage.googleapis.com/seventh-terrain-179700.appspot.com/excuses.jpg", "https://storage.googleapis.com/seventh-terrain-179700.appspot.com/excuses2.jpg"]

        pdf = FPDF()
        i = 0
        for image in imagelist:
            image = urllib2.urlopen(image)

            with open('image'+str(i)+'.jpg','wb') as output:
                output.write(image.read())

            pdf.add_page()
            pdf.image('image'+str(i)+'.jpg', 10, 10, 100, 100) # pdf.image(image,x,y,w,h)

            os.remove('image'+str(i)+'.jpg')
            i += 1

        pdf.output("yourfile.pdf", "F")

application = webapp2.WSGIApplication([('/', MainPage)],
                                      debug=True)

But, I am getting error:

WARNING  2017-12-08 19:21:56,184 sandbox.py:1082] The module _winreg is whitelisted for local dev only. If your application relies on _winreg, it is likely that it will not function properly in production.
WARNING  2017-12-08 14:21:56,190 urlfetch_stub.py:551] Stripped prohibited headers from URLFetch request: ['Host']
ERROR    2017-12-08 19:21:57,332 webapp2.py:1528] [Errno 30] Read-only file system: 'image0.jpg'
Traceback (most recent call last):
  File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\lib\webapp2-2.3\webapp2.py", line 1511, in __call__
    rv = self.handle_exception(request, response, e)
  File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\lib\webapp2-2.3\webapp2.py", line 1505, in __call__
    rv = self.router.dispatch(request, response)
  File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\lib\webapp2-2.3\webapp2.py", line 1253, in default_dispatcher
    return route.handler_adapter(request, response)
  File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\lib\webapp2-2.3\webapp2.py", line 1077, in __call__
    return handler.dispatch()
  File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\lib\webapp2-2.3\webapp2.py", line 547, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\lib\webapp2-2.3\webapp2.py", line 545, in dispatch
    return method(*args, **kwargs)
  File "C:\MyMiniGCPProjects\FPDF\main.py", line 23, in get
    with open('image'+str(i)+'.jpg','wb') as output:
  File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\tools\devappserver2\python\runtime\stubs.py", line 278, in __init__
    raise IOError(errno.EROFS, 'Read-only file system', filename)
IOError: [Errno 30] Read-only file system: 'image0.jpg'
ERROR    2017-12-08 19:21:57,339 wsgi.py:279]
Traceback (most recent call last):
  File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\runtime\wsgi.py", line 267, in Handle
    result = handler(dict(self._environ), self._StartResponse)
  File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\lib\webapp2-2.3\webapp2.py", line 1519, in __call__
    response = self._internal_error(e)
  File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\lib\webapp2-2.3\webapp2.py", line 1511, in __call__
    rv = self.handle_exception(request, response, e)
  File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\lib\webapp2-2.3\webapp2.py", line 1505, in __call__
    rv = self.router.dispatch(request, response)
  File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\lib\webapp2-2.3\webapp2.py", line 1253, in default_dispatcher
    return route.handler_adapter(request, response)
  File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\lib\webapp2-2.3\webapp2.py", line 1077, in __call__
    return handler.dispatch()
  File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\lib\webapp2-2.3\webapp2.py", line 547, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\lib\webapp2-2.3\webapp2.py", line 545, in dispatch
    return method(*args, **kwargs)
  File "C:\MyMiniGCPProjects\FPDF\main.py", line 23, in get
    with open('image'+str(i)+'.jpg','wb') as output:
  File "C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\google_appengine\google\appengine\tools\devappserver2\python\runtime\stubs.py", line 278, in __init__
    raise IOError(errno.EROFS, 'Read-only file system', filename)
IOError: [Errno 30] Read-only file system: 'image0.jpg'

I am not able to find any solution that works. Is there a way to use files directly from Cloud Storage and also save PDF in Cloud Storage?

score 2 · Answer 1 · answered Dec 09 '17 at 05:23

2

You're hitting one of the sandbox restrictions. From The sandbox:

An App Engine application cannot:

write to the filesystem. Applications must use Cloud Datastore for storing persistent data. Reading from the filesystem is allowed, and all application files uploaded with the application are available.

Well, the note about the datastore is actually misleading, there are several storage options, best for your case is IMHO the Cloud Storage (GCS).

But you can't write a file to GCS using the regular open(), you nede to use the GCS client library for it. You can find an example here: Write a CSV to store in Google Cloud Storage

answered Dec 09 '17 at 05:23

Dan Cornilescu

39,470
12
57
97

Thanks. The images in `imagelist` are stored in Google Cloud Storage. I cannot directly you them in `FPDF().image()` because it requires images to be stored locally. So, I am using `urllib2.urlopen(image)` that creates an object of the image file. But, because `FPDF().image()` requires images to be stored locally, I am using `with open('image'+str(i)+'.jpg','wb') as output`, which creates above error. You are suggesting to store in Google Cloud Storage, but images are already in Google Cloud Storage. Any idea how to solve this problem? – Beginner Dec 09 '17 at 15:21
The doc indicates the `file` parameter can be a URL as well: `Path or URL of the image`: http://www.fpdf.org/en/doc/image.htm – Dan Cornilescu Dec 09 '17 at 17:57
Yes, it says that, but when I am using: `pdf.image("https://storage.googleapis.com/seventh-terrain-179700.appspot.com/excuses.jpg", 10, 10, 100, 100)` I get error: `RuntimeError: FPDF error: Missing or incorrect image file: https://storage.googleapis.com/seventh-terrain-179700.appspot.com/excuses.jpg. error: [Errno 22] invalid mode ('rb') or filename: 'https://storage.googleapis.com/seventh-terrain-179700.appspot.com/excuses.jpg' ` – Beginner Dec 10 '17 at 18:14
So, I was using `urllib2.urlopen(image)` and storing it locally as mentioned here: [link](https://stackoverflow.com/questions/3177716/fpdf-error-missing-or-incorrect-image-file) Do you know how to make it work? – Beginner Dec 10 '17 at 18:14

score 0 · Answer 2 · answered Mar 26 '23 at 19:39

You can directly write the image to gcs using the below code:

import io
from PIL import Image
from google.cloud import storage
from pdf2image import convert_from_bytes

storage_client = storage.Client()

def convert_pil_image_to_byte_array(img):
    img_byte_array = io.BytesIO()
    img.save(img_byte_array, format='JPEG', subsampling=0, quality=100)
    img_byte_array = img_byte_array.getvalue()
    return img_byte_array

def write_to_gcs_bucket(bucket_name, source_prefix, target_prefix):
    bucket = storage_client.get_bucket(bucket_name)
    blob = bucket.get_blob(source_prefix)
    contents = blob.download_as_string()
    images = convert_from_bytes(contents,first_page = 5)
    for i in range(len(images)):
        object_byte = convert_pil_image_to_byte_array(images[i])
        file_name = 'slide' + str(i) + '.jpg'
        blob = bucket.blob(target_prefix + file_name)
        blob.upload_from_string(object_byte)

How to create a PDF of images stored in Google Cloud Storage?

2 Answers2