0

I have the following code that crops part of pdf file then save the output as PDF

from PyPDF2 import PdfFileWriter, PdfFileReader

with open("Sample.pdf", "rb") as in_f:
    input1 = PdfFileReader(in_f)
    output = PdfFileWriter()

    numPages = input1.getNumPages()
    print("Document Has %s Pages." % numPages)

    for i in range(1):
        page = input1.getPage(i)
        print(page.mediaBox.getUpperRight_x(), page.mediaBox.getUpperRight_y())
        page.trimBox.lowerLeft = (280, 280)
        page.trimBox.upperRight = (220, 200)
        page.cropBox.lowerLeft = (100, 720)
        page.cropBox.upperRight = (220, 800)
        output.addPage(page)

    with open("Output.pdf", "wb") as out_f:
        output.write(out_f)

How can I save as an image not as PDF? I found this code but the output is not at high quality. How can I improve the quality of the image output?

import fitz

pdffile = "Output.pdf"
doc = fitz.open(pdffile)
page = doc.loadPage(0)
pix = page.getPixmap()
output = "Output.jpg"
pix.writePNG(output)
YasserKhalil
  • 9,138
  • 7
  • 36
  • 95

2 Answers2

1

Hi There You Could Use The pdf2image library for achieving so. You Could Use The Following Code At The End:

from pdf2image import convert_from_path
images = convert_from_path('Output.pdf')
for i in range(len(images)):
    images[i].save('Output'+ str(i) +'.jpg', 'JPEG')

Then If You Wish You Could Use The os library to delete the pdf you made using the following code in order to avoid the hassle of deleting the pdf yourself.

import os
os.remove("Output.pdf")
mrtechtroid
  • 658
  • 3
  • 14
  • Thanks a lot. I already tested that but I am wondering as I got the full page before cropping while the Output.pdf is cropped to specific part .. – YasserKhalil Feb 09 '21 at 02:55
  • Here's sample pdf to test https://excel-egy.com/Up/do.php?id=511 – YasserKhalil Feb 09 '21 at 02:56
  • 1
    So The Code you are using Crops The PDF Before Getting Converted To The Output. So if you would like to get the Output As Full Pages Without Any Cropping You May Need To Remove The Lines Stating `trimbox` or `cropbox`. Or Even Just Use the above Code as you don't need to make a new output.pdf for this to work and instead change `images = convert_from_path('Output.pdf')` to `images = convert_from_path('Sample.pdf')` – mrtechtroid Feb 09 '21 at 03:04
  • I don't want the full page. The desired image as output is the cropped part. Using fitz package is doing that correctly (It is weird in fact) – YasserKhalil Feb 09 '21 at 03:06
  • 1
    You Need To Add Those Two Gists Of Code I Mentioned In The Answer To The Original Code(At The Last) which you uploaded for that to happen. – mrtechtroid Feb 09 '21 at 03:08
  • Thanks a lot for your great support. I still get the full page even if putting all the lines together. Simply I need the image for the cropped part only. Can you test on your side? – YasserKhalil Feb 09 '21 at 03:14
  • Is there a way to export to an image directly without exporting the cropped part to PDF first?? – YasserKhalil Feb 09 '21 at 03:23
  • Also when trying to remove the output.pdf file I encountered `PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'Output.pdf'` – YasserKhalil Feb 09 '21 at 03:28
0

This solves the problem but I welcome any advanced ideas and improvements

import fitz

pdffile = "Output.pdf"
doc = fitz.open(pdffile)

zoom = 2    # zoom factor
mat = fitz.Matrix(zoom, zoom)

page = doc.loadPage(0)
pix = page.getPixmap(matrix = mat)
output = "Output.jpg"
pix.writePNG(output)
YasserKhalil
  • 9,138
  • 7
  • 36
  • 95