3

I have a PDF with a CropBox size of 6" wide x 9" high. I need to add it to a standard letter-sized PDF. If I change the CropBox size, then the cropmarks become visible. So ideally what I'd like to do is crop out just the visible portion of the page, then pad the sides so that the total height and width is letter-sized.

Is this possible using PDFBox or another Java class?

Jordan Reiter
  • 20,467
  • 11
  • 95
  • 161
  • In addition to changing the crop box, you can prepend the content streams of each page with a cliiping path along the current crop box border. – mkl May 20 '13 at 10:00
  • @mkl which class would I be using here? Is that something under PDPage? – Jordan Reiter May 20 '13 at 19:28
  • *grin* That's why I made it a comment, not an answer. I know that prepending a clipping path would be a solution PDF-wise but I'm not too knowledgeable concerning PDFBox and, therefore, cannot easily say how to do that in PDFBox. – mkl May 20 '13 at 20:39

3 Answers3

3

Have you found an answer to your problem ? I have been facing the same scenario this week.

I have a standard letter-size (8,5" x 11") PDF A, containing a header, a footer, and a form. I have no control over that PDF's generation, so the header and footer are a bit dirty and I need to remove them. My first approach was to extract the form into a Box (any type of box works), and then export it as a new PDF page. Problem is, my new Box is a certain size (let's say 6" x 7"), and after thorough research into the docs, I was unable to find a way to embed it into a 8,5" x 11" PDF B ; the output PDF was the same size as my Box. All scenarios either led to a blank PDF file of the right size, or a PDF containing my form but of wrong dimensions.

I then had no choice but to use another approach. It isn't very clean, but hey, when working with PDFs, black magic and workarounds are the main topic. I simply kept the original PDF A, and blanked out all the unwanted parts. That means, I created rectangles, filled them with white, and covered up the sections I wanted to hide. Result is a PDF file, of right dimension, containing only my form. Hooray ! Technically, the header and footer are still present in the page, there was no way to actually remove them ; I was only able to hide them (this doesn't make any difference to the end user as long as you're not hiding sensitive data).

I realize your question was submitted 2 years ago, but I had a very hard time finding a proper answer to my question online, so here's me giving back to the community, and hoping I can help future developers save some time. If you actually found a way to extract a box and embed it in a standard-size page, please post your answer !

Here is my code by the way :

import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;

import java.awt.Color;
import java.io.*;
import java.util.List;

// This code doesn't actually extract PDF elements per say
// It fills 2 rectangles in white to hide the header and the footer of our PDF page
public class ex {

    // Arbitrary values obtained in a very obscure way
    static int PAGE_WIDTH = 615;
    static int PAGE_HEIGHT = 815;

    @SuppressWarnings("unchecked")
    public static void main(String[] args) throws IOException, COSVisitorException {

        File inputFile = new File("C:\\input.pdf");
        File outputFile = new File("C:\\output.pdf");

        PDDocument inputDoc = PDDocument.load(inputFile);
        PDDocument outputDoc = new PDDocument();

        List<PDPage> pages = inputDoc.getDocumentCatalog().getAllPages();

        PDPageContentStream pageCS = null;

        // Lets paint our pages white !
        for (PDPage page : pages) {
            pageCS = new PDPageContentStream(inputDoc, page, true, false);
            pageCS.setNonStrokingColor(Color.white);
            // Top rectangle
            pageCS.fillRect(0, 0, PAGE_WIDTH, 30);
            // Bottom rectangle
            pageCS.fillRect(0, PAGE_HEIGHT-30, PAGE_WIDTH, 30);
            pageCS.close();
            outputDoc.addPage(page);
        }

        // Save to file
        outputFile.delete();
        outputDoc.save(outputFile);

        // Wait until the end to close all documents, or else you get an error
        inputDoc.close();
        outputDoc.close();
    }
}
John Pink
  • 607
  • 5
  • 14
2

I have adopted the answer of John a little bit, maybe this will help someone.

I have changed the loop to create a new rectangle, with the wanted dimensions. Then the rectangle is set to the page and afterwards added to the new document. I used this snippet to crop a black border out of a long scanned document.

Notice that this will change the size of the pages.

import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;


import java.io.File;
import java.io.IOException;
import java.util.List;

public class Main {


    @SuppressWarnings("unchecked")
    public static void main(String[] args) throws IOException, COSVisitorException {

        File inputFile = new File("/path/to/your/file");
        File outputFile = new File("/path/to/your/file");

        PDDocument inputDoc = PDDocument.load(inputFile);
        PDDocument outputDoc = new PDDocument();

        List<PDPage> pages = inputDoc.getDocumentCatalog().getAllPages();

        // Lets paint our pages white !
        for (PDPage page : pages) {
            PDRectangle rectangle=new PDRectangle();
            rectangle.setLowerLeftX(0);
            rectangle.setLowerLeftY(0);
            rectangle.setUpperRightX(500);
            rectangle.setUpperRightY(680);

            page.setMediaBox(rectangle);
            page.setCropBox(rectangle);
            outputDoc.addPage(page);
        }

        // Save to file
//        outputFile.delete();
        outputDoc.save(outputFile);

        // Wait until the end to close all documents, or else you get an error
        inputDoc.close();
        outputDoc.close();
    }
}
Zelle
  • 21
  • 2
1

Other than adding a rectangle to the PDPage constructor you can do this do set the CropBox to any size:

PDRectangle box = new PDRectangle(pageWidth, pageHeight);
page.setMediaBox(box); // MediaBox > BleedBox > TrimBox/CropBox
Baked Inhalf
  • 3,375
  • 1
  • 31
  • 45
  • how does your answer relate to the question? In particular, how does that make the *cropmarks* invisible? – mkl Dec 12 '17 at 21:25
  • how does your comment relate to my answer? this question came up when I was searching for a similar problem. future visitors might find it useful. I don't have cropmarks, at least not with sdk 2.x of pdfbox – Baked Inhalf Dec 12 '17 at 22:20
  • *"I don't have cropmarks, at least not with sdk 2.x of pdfbox"* - that merely means that you don't have PDFs with crop marks. Enlarging boxes, in particular media and crop, can make stuff visible which before was hidden outside the box. And that was the problem of the OP. In your use cases you merely never seem to have had to deal with such PDFs. – mkl Dec 12 '17 at 23:13
  • OP was asked 4 years ago! My contribution is for new visitors who might have cropbox issues as it's closely related. End of discussion :) – Baked Inhalf Dec 13 '17 at 08:47
  • You seem to misunderstand the stack overflow format. It is a question and answer format. Related stuff might be posted as a comment or as a *PS* to an on-topic answer. – mkl Dec 13 '17 at 10:48
  • You seem to missunderstand the end of discussion? – Baked Inhalf Dec 13 '17 at 11:28
  • *"You seem to missunderstand the end of discussion"* - that appears off-topic. – mkl Dec 13 '17 at 12:24