1

I am using PyPDF2 to take an input PDF of any paper size and convert it to a PDF of A4 size with the input PDF scaled and fit in the centre of the output pdf.

Here's an example of an input (convert to pdf with imagemagick convert image.png input.pdf), which can be of any dimensions: input

And the expected output is: output

I'm not a developer and my knowledge of python is basic but I have been trying to figure this out from the documentation, but haven't had much success.

My latest attempt is as follows:

from pypdf import PdfReader, PdfWriter, Transformation, PageObject
from pypdf import PaperSize

pdf_reader = PdfReader("input.pdf")
page = pdf_reader.pages[0]
writer = PdfWriter()

A4_w = PaperSize.A4.width
A4_h = PaperSize.A4.height


# resize page2 to fit *inside* A4
h = float(page.mediabox.height)
w = float(page.mediabox.width)
print(A4_h, h, A4_w, w)
scale_factor = min(A4_h / h, A4_w / w)
print(scale_factor)

transform = Transformation().scale(scale_factor, scale_factor).translate(0, A4_h / 3)
print(transform.ctm)

# page.scale_by(scale_factor)
page.add_transformation(transform)

# merge the pages to fit inside A4

# prepare A4 blank page
page_A4 = PageObject.create_blank_page(width=A4_w, height=A4_h)
page_A4.merge_page(page)
print(page_A4.mediabox)

writer.add_page(page_A4)
writer.write("output.pdf")

Which gives this output:

enter image description here

I could be completely off track with my approach and it may be the inefficient way of doing it.

I was hoping I would have a simple function in the package where I can define the output paper size and the scaling factor, similar to this.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
Zain Khaishagi
  • 135
  • 1
  • 9
  • What do you want to achieve? Could it be that the only thing missing is changing the size of the mediabox? You can simply assign the expected size to the mediabox attribute. – Martin Thoma Jan 28 '23 at 10:25
  • @MartinThoma Thank you for personally replying to this. I know you're the maintainer so i appreciate it. What i want to achieve is that the output should look like "Expected Output" image that i've provided in my question. Basically, i want the input pdf to be scaled and centered on an A4 page as the output. – Zain Khaishagi Jan 28 '23 at 21:39
  • Gotcha - I didn't read that properly before. I'll check again tomorrow what the issue could be :-) – Martin Thoma Jan 28 '23 at 22:16
  • Thanks, looking forward to your reply. – Zain Khaishagi Jan 29 '23 at 10:16

1 Answers1

4

You almost got it!

The transformations are applied only to the content, but not to the boxes (mediabox/trimbox/cropbox/artbox/bleedbox).

You need to adjust the cropbox:

from pypdf.generic import RectangleObject
page.cropbox = RectangleObject((0, 0, A4_w, A4_h))

Full script

from pypdf import PdfReader, PdfWriter, Transformation, PageObject, PaperSize
from pypdf.generic import RectangleObject

reader = PdfReader("input.pdf")
page = reader.pages[0]
writer = PdfWriter()

A4_w = PaperSize.A4.width
A4_h = PaperSize.A4.height

# resize page to fit *inside* A4
h = float(page.mediabox.height)
w = float(page.mediabox.width)
scale_factor = min(A4_h/h, A4_w/w)

transform = Transformation().scale(scale_factor,scale_factor).translate(0, A4_h/3)
page.add_transformation(transform)

page.cropbox = RectangleObject((0, 0, A4_w, A4_h))

# merge the pages to fit inside A4

# prepare A4 blank page
page_A4 = PageObject.create_blank_page(width = A4_w, height = A4_h)
page.mediabox = page_A4.mediabox
page_A4.merge_page(page)

writer.add_page(page_A4)
writer.write('output.pdf')
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • Awesome. Thank you so much. I am getting the output i wanted. Can you also tell me if this is the correct way to do it or if there is a simpler way to centre and fit input.pdf on an A4 size page ? I wonder if the approach I have taken is over complicated and there may be a simpler and more direct method. – Zain Khaishagi Jan 29 '23 at 21:04
  • Your code was good - that is the best way to do it :-) Please also note that I used `pypdf` instead of `PyPDF2`. We moved the project to `pypdf` as people often got confused with the capitalization (`pyPdf` vs `pyPdf2` vs `PyPDF2` vs `pypdf2` ...) – Martin Thoma Jan 30 '23 at 06:17
  • Yes, i noticed that when I was reading the documentation. I am using this on AWS lambda and I was able to find layer for PyPDF2, do you know if there is a layer available for pypdf also ? – Zain Khaishagi Jan 30 '23 at 09:54
  • Oh, interesting - I haven't seen [AWS lambda layers](https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html) before. I guess you mean https://github.com/kuharan/Lambda-Layers/blob/master/3.8/pypdf2-layer.zip ? – Martin Thoma Jan 30 '23 at 10:08
  • Hey Martin, Thanks for your help here. This script works fine but I may have found a bug. I've created a new question if you can take a look please: https://stackoverflow.com/q/75317584/11501160 – Zain Khaishagi Feb 02 '23 at 00:35