1

I want to convert the first page of pdf to an image. And my below code is working well in my local environment: Ubuntu 18. But when I run in the docker environment, it fails and raises:

wand.exceptions.WandRuntimeError: MagickReadImage returns false, but did raise ImageMagick exception. This can occurs when a delegate is missing, or returns EXIT_SUCCESS without generating a raster.

Am I missing a dependency? Or something else? I don't know what it's referring to as 'delegate'.

I saw the source code, it fails in here: wand/image.py::7873lines

if blob is not None:
    if not isinstance(blob, abc.Iterable):
        raise TypeError('blob must be iterable, not ' +
                        repr(blob))
    if not isinstance(blob, binary_type):
        blob = b''.join(blob)
    r = library.MagickReadImageBlob(self.wand, blob, len(blob))
elif filename is not None:
    filename = encode_filename(filename)
    r = library.MagickReadImage(self.wand, filename)
if not r:
    self.raise_exception()
    msg = ('MagickReadImage returns false, but did raise ImageMagick '
           'exception. This can occurs when a delegate is missing, or '
           'returns EXIT_SUCCESS without generating a raster.')
    raise WandRuntimeError(msg)

The line r = library.MagickReadImageBlob(self.wand, blob, len(blob)) returns true in my local environment, but in the docker it returns false. Moreover, the args blob and len(blob) is same.

def pdf2img(fp, page=0):
    """
    convert pdf to jpeg image
    :param fp: a file-like object
    :param page:
    :return: (Bool, File) if False, mean the `fp` is not pdf, if True, then the `File` is a file-like object
        contain the `jpeg` format data
    """
    try:
        reader = PdfFileReader(fp, strict=False)
    except Exception as e:
        fp.seek(0)
        return False, None
    else:
        bytes_in = io.BytesIO()
        bytes_out = io.BytesIO()
        writer = PdfFileWriter()

        writer.addPage(reader.getPage(page))
        writer.write(bytes_in)
        bytes_in.seek(0)

        im = Image(file=bytes_in, resolution=120)
        im.format = 'jpeg'
        im.save(file=bytes_out)
        bytes_out.seek(0)
        return True, bytes_out

Milo
  • 3,365
  • 9
  • 30
  • 44
hstk
  • 163
  • 2
  • 10
  • Most likely you are missing the Ghostscript delegate installed with ImageMagick. ImageMagick uses Ghostscript to rasterize PDF files. There a many posts on this forum that indicate that Docker has failing to install that. Search this forum to find those posts and solutions. Once you get Ghostscript installed, if it does not work, then you may need to edit the policy.xml file to permit PDF files to be used in ImageMagick. See https://stackoverflow.com/questions/52861946/imagemagick-not-authorized-to-convert-pdf-to-an-image/52863413#52863413 – fmw42 Jul 30 '19 at 16:29

1 Answers1

4

I don't know what it's referring to as 'delegate'.

With ImageMagick, a 'delegate' refers to any shared library, utility, or external program that does the actual encoding & decoding of file type. Specifically, a file format to a raster.

Am I missing a dependency?

Most likely. For PDF, you would need a ghostscript installed on the docker instance.

Or something else?

Possible, but hard to determine without an error message. The "WandRuntimeError" exception is a catch-all. It exists because a raster could not be generated from the PDF, and both Wand & ImageMagick can not determine why. Usually there would be an exception if the delegate failed, security policy message, or an OS error.

Best thing would be to run a few gs commands to see if ghostscript is working correctly.

gs -sDEVICE=pngalpha -o page-%03d.png -r120 input.pdf

If the above works, then try again just with ImageMagick

convert -density 120 input.pdf page-%03d.png
emcconville
  • 23,800
  • 4
  • 50
  • 66
  • Oh, in the docker it not have `ghostscript` default, so it fail. Now i install it, it work well! – hstk Jul 31 '19 at 02:40