I have an low-resolution black & white image screenBWsmall.png
:
I use the Python Imaging Library to convert it to PDF:
#!python
from PIL import Image
im = Image.open('screenBWsmall.png')
im.save('screenBWsmall.pdf')
The PDF file is huge compared to one generated from ImageMagick's convert
, issued from the Bash command line:
convert screenBWsmall.png screenBWsmall_IM.pdf
The file sizes are:
11093 screenBWsmall.png
1050994 screenBWsmall.pdf
16999 screenBWsmall_IM.pdf
While I'm puzzled by this, it is even more puzzling considering that the larger file screenBWsmall.pdf
uses 1 bit per pixel (bits per component, or bpc
) compared to 8 bpc for the smaller file screenBWsmall_IM.pdf
:
$ pdfimages.exe -list screenBWsmall.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 960 540 gray 1 1 image no 1 0 72 72 1025K 1621%
$ pdfimages.exe -list screenBWsmall_IM.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 960 540 gray 1 8 image no 8 0 72 72 14.9K 2.9%
The Image.save documentation doesn't give much information with which I can speculate on the reason for the large file size.
Why does PIL create such a large PDF file size?
Is there any way to have it create the smaller size of ImageMagick's convert
? I want to do it in Python because I will be performinng more complex steps with many files.
My Python version is:
Python 3.8.8 (default, Mar 4 2021, 21:24:42)
[GCC 10.2.0] on cygwin
Investigations with ImageMagick's convert
Thanks to fmw42's suggestions, I systematically experimented with 3 ways to shrink and combine 2 JPG images to into 1 PDF file. In order of decreasing file size, the 3 methods are as follows.
Method #1: Use Python's PIL to generate IMG_077x_PIL.pdf (see jpg2pdf.py
below). In the process of doing so, save shrunken versions of both images to separate PNG files for Method #3.
Method #2: Use ImageMagick's convert
to generate IMG_077x_IMcvt.pdf:
convert -sample 50% -type Bilevel +dither IMG_077[45].JPG IMG_077x_IMcvt.pdf
Method #3: Apply convert
to shrunken PNG files from PIL to generate IMG_077x_PIL+IMcvt.pdf:
convert IMG_077[45]small.png IMG_077x_PIL+IMcvt.pdf
Output PDF file sizes (in the same order as the methods above):
12350481 IMG_077x_PIL.pdf
1234076 IMG_077x_IMcvt.pdf
149782 IMG_077x_PIL+IMcvt.pdf
The 2 input JPG files sizes are a few MBs:
2526685 IMG_0774.JPG
2699515 IMG_0775.JPG
The 2 intermediate PNG file sizes used in Methods #1 and #3 are few dozen KBs:
67283 IMG_0775small.png
61968 IMG_0774small.png
Observations:
Method #1: Great for shrinking the images down, but really bad in generating an enormous PDF file that is two orders of magnitude larger than it has to be.
Method #2: Middle of the road, most convenient, but PDF file size is an order of magnitude larger than it has to be.
Method #3: Requires both Python, PIL, and
convert
. It is the least convenient, but most byte efficient. The resulting PDF is only slightly larger than the sum of the two PNG images.
I wish there was a way to make Methods #1 and/or #2 as good as Method #3.
Characteristics of the output PDF files
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
$pdfimages -list IMG_077x_PIL.pdf
1 0 image 1512 2016 gray 1 1 image no 1 0 72 72 6030K 1621%
2 1 image 1512 2016 gray 1 1 image no 4 0 72 72 6030K 1621%
$pdfimages -list IMG_077x_IMcvt.pdf
1 0 image 2016 1512 gray 1 8 jpeg no 8 0 72 72 576K 19%
2 1 image 2016 1512 gray 1 8 jpeg no 22 0 72 72 626K 21%
$pdfimages -list IMG_077x_PIL+IMcvt.pdf
1 0 image 1512 2016 gray 1 8 image no 8 0 72 72 68.9K 2.3%
2 1 image 1512 2016 gray 1 8 image no 22 0 72 72 74.6K 2.5%
jpg2pdf.py
#!python
# jpg2pdf.py
#-----------
# Use PIL to subsample, rotate, and convert 2 JPGs to B&W.
# Save each to small PNGs.
# Combine both into a PDF.
import os
from PIL import Image
ims = [] # Stores the 2 images
fns=('IMG_0774.JPG','IMG_0775.JPG') # Filenames of the 2 images
for fn in fns:
# Read, resize, rotate, convert to B&W, add to list `fns`
im = Image.open(fn)
im = im.resize((im.width//2, im.height//2))
im = im.rotate(-90,expand=True)
im = im.convert(mode="1", dither=Image.NONE)
ims.append(im)
# Write IMG_077[45]small.png
fnBase = os.path.splitext(fn)[0]
im.save( fnBase+'small.png' )
# Write both to a single PDF
ims[0].save( 'IMG_077x_PIL.pdf' , save_all=True , append_images=ims[1:] )
A test input file
This dummy JPEG image file should be save-able as both
IMG_0774.JPG
and IMG_0775.JPG
. Methods #1 through #3 should then
work exactly as described with the code posted above. Using this JPG
image, I confirmed that the 3 output file sizes are almost the same as
reported in my question. Being just over 2MB, unfortunately, it can't
be uploaded to this posted question.