Can't load PDF with Wand/ImageMagick in Google Cloud Function

Question

Trying to load a PDF from the local file system and getting a "not authorized" error.

"File "/env/local/lib/python3.7/site-packages/wand/image.py", line 4896, in read self.raise_exception() File "/env/local/lib/python3.7/site-packages/wand/resource.py", line 222, in raise_exception raise e wand.exceptions.PolicyError: not authorized `/tmp/tmp_iq12nws' @ error/constitute.c/ReadImage/412

The PDF file is successfully saved to the local 'server' from GCS but won't be loaded by Wand. Loading images into OpenCV isn't an issue, just happening when trying to load PDFs using Wand/ImageMagick

Code to load the PDF from GCS to local file system into Wand/ImageMagick is below

_, temp_local_filename = tempfile.mkstemp()
gcs_blob = STORAGE_CLIENT.bucket('XXXX').get_blob(results["storedLocation"])
gcs_blob.download_to_filename(temp_local_filename)
# load the pdf into a set of images using imagemagick
with(Image(filename=temp_local_filename, resolution=200)) as source:
    #run through pages and save images etc.

ImageMagick should be authorised to access files on the local filesystem so it should load the file without issue instead of this 'Not Authorised' error.

See https://stackoverflow.com/questions/52861946/imagemagick-not-authorized-to-convert-pdf-to-an-image/52863413#52863413 — fmw42, Apr 02 '19 at 22:43
This is a duplicate of https://stackoverflow.com/questions/53296500/google-cloud-function-with-wand-stopped-working, which has not been resolved yet. — Dustin Ingram, Apr 02 '19 at 22:55
Thanks - after doing some reading, the issue is with a security vulnerability in the Ghostscript 9.26 release so all filetypes that Imagemagick uses Ghostscript to read have been disabled. Hopefully security fixes exist in Ghostscript 9.27 and The Google Cloud Functions team can get the patches in quickly. Not being able to manually tweak the policy.xml as a workaround in the GCF environment sucks. — timhj, Apr 02 '19 at 23:13
Does this answer your question? [convert:not authorized \`aaaa\` @ error/constitute.c/ReadImage/453](https://stackoverflow.com/questions/42928765/convertnot-authorized-aaaa-error-constitute-c-readimage-453) — kenorb, Jul 14 '20 at 16:14
@kenorb - no it's not possible to change security settings like that in Google Cloud Functions. — timhj, Jul 15 '20 at 02:29

timhj · Accepted Answer · 2019-04-03T01:26:41.427

PDF reading by ImageMagick has been disabled because of a security vulnerability Ghostscript had. The issue is by design and a security mitigation from the ImageMagick team will exist until. ImageMagick Enables Ghostscript processing of PDFs again and Google Cloud Functions update to that new version of ImageMagick with PDF processing enabled again.

There's no fix for the ImageMagick/Wand issue in GCF that I could find but as a workaround for converting PDFs to images in Google Cloud Functions, you can use this [ghostscript wrapper][2] to directly request the PDF conversion to an image via Ghostscript and bypass ImageMagick/Wand. You can then load the PNGs into ImageMagick or OpenCV without issue.

requirements.txt

google-cloud-storage
ghostscript==0.6

main.py

    # create a temp filename and save a local copy of pdf from GCS
    _, temp_local_filename = tempfile.mkstemp()
    gcs_blob = STORAGE_CLIENT.bucket('XXXX').get_blob(results["storedLocation"])
    gcs_blob.download_to_filename(temp_local_filename)
    # create a temp folder based on temp_local_filename
    temp_local_dir = tempfile.mkdtemp()
    # use ghostscript to export the pdf into pages as pngs in the temp dir
    args = [
        "pdf2png", # actual value doesn't matter
        "-dSAFER",
        "-sDEVICE=pngalpha",
        "-o", temp_local_dir+"page-%03d.png",
        "-r300", temp_local_filename
        ]
    # the above arguments have to be bytes, encode them
    encoding = locale.getpreferredencoding()
    args = [a.encode(encoding) for a in args]
    #run the request through ghostscript
    ghostscript.Ghostscript(*args)
    # read the files in the tmp dir and process the pngs individually
    for png_file_loc in glob.glob(temp_local_dir+"*.png"):
        # loop through the saved PNGs, load into OpenCV and do what you want
        cv_image = cv2.imread(png_file_loc, cv2.IMREAD_UNCHANGED)

Hope this helps someone facing the same issue.

Can't load PDF with Wand/ImageMagick in Google Cloud Function

1 Answers1

Linked