0

I am trying to extract and save a certain portion of a PDF file using specific coordinates (x,y,w,h). I am using the following code, which seems to be working okay:

Function CroppedPdf(Source As Byte(), PageNumber As Integer,
        Rect As System.Drawing.Rectangle) As MemoryStream
    Dim reader As New PdfReader(Source)
    Dim h = reader.GetPageSize(1).Height
    Dim document = New iTextSharp.text.Document(New iTextSharp.text.Rectangle(
            Rect.Right / 300 * 72, h - (Rect.Top / 300 * 72), Rect.Left / 300 * 72,
            h - (Rect.Bottom / 300 * 72)))
    document.SetMargins(0, 0, 0, 0)
    Dim destination = New MemoryStream
    Dim writer = PdfWriter.GetInstance(document, destination)
    document.Open()
    Dim cb = writer.DirectContent
    document.NewPage()
    Dim page = writer.GetImportedPage(reader, 1)
    cb.AddTemplate(page, 0, 0)
    document.Close()
    Return destination
End Function

The problem is that the resulting pdf is only seemingly cropped. When I try to run text extraction on it, I get back the text of the entire original source document. Furthermore, when splitting up a page in 10 pieces, the same document is actually stored 10 times with differences only in the viewport. How can I truly crop the PDF file, storing only the exact portion of the file I am interested in?

jvdhooft
  • 657
  • 1
  • 12
  • 33
Yisroel M. Olewski
  • 1,560
  • 3
  • 25
  • 41
  • Truly cropping a PDF can be hard work and is not explicitly supported by itext yet, even though you certainly can implement that using itext as low-level framework. – mkl Aug 29 '13 at 11:02
  • can you please elaborate? i.e. how would i use it as a "low level framework" or on the other hand, maybe refer to a (preferably free or lowcost) library that would do it? thanks – Yisroel M. Olewski Aug 29 '13 at 21:13
  • 1
    I don't know whether there is a free or low cost library that has that functionality out of the box. Actually libraries supporting true redaction might have as the requirements are similar. – mkl Aug 30 '13 at 05:06
  • thanks. i did some research on pdf redaction. it doesnt really look line what i need. maybe ill try to find a pdf "printer" which might print only relevant portions. – Yisroel M. Olewski Sep 01 '13 at 09:30

0 Answers0