Have you found an answer to your problem ? I have been facing the same scenario this week.
I have a standard letter-size (8,5" x 11") PDF A, containing a header, a footer, and a form. I have no control over that PDF's generation, so the header and footer are a bit dirty and I need to remove them. My first approach was to extract the form into a Box (any type of box works), and then export it as a new PDF page. Problem is, my new Box is a certain size (let's say 6" x 7"), and after thorough research into the docs, I was unable to find a way to embed it into a 8,5" x 11" PDF B ; the output PDF was the same size as my Box. All scenarios either led to a blank PDF file of the right size, or a PDF containing my form but of wrong dimensions.
I then had no choice but to use another approach. It isn't very clean, but hey, when working with PDFs, black magic and workarounds are the main topic. I simply kept the original PDF A, and blanked out all the unwanted parts. That means, I created rectangles, filled them with white, and covered up the sections I wanted to hide. Result is a PDF file, of right dimension, containing only my form. Hooray ! Technically, the header and footer are still present in the page, there was no way to actually remove them ; I was only able to hide them (this doesn't make any difference to the end user as long as you're not hiding sensitive data).
I realize your question was submitted 2 years ago, but I had a very hard time finding a proper answer to my question online, so here's me giving back to the community, and hoping I can help future developers save some time. If you actually found a way to extract a box and embed it in a standard-size page, please post your answer !
Here is my code by the way :
import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;
import java.awt.Color;
import java.io.*;
import java.util.List;
// This code doesn't actually extract PDF elements per say
// It fills 2 rectangles in white to hide the header and the footer of our PDF page
public class ex {
// Arbitrary values obtained in a very obscure way
static int PAGE_WIDTH = 615;
static int PAGE_HEIGHT = 815;
@SuppressWarnings("unchecked")
public static void main(String[] args) throws IOException, COSVisitorException {
File inputFile = new File("C:\\input.pdf");
File outputFile = new File("C:\\output.pdf");
PDDocument inputDoc = PDDocument.load(inputFile);
PDDocument outputDoc = new PDDocument();
List<PDPage> pages = inputDoc.getDocumentCatalog().getAllPages();
PDPageContentStream pageCS = null;
// Lets paint our pages white !
for (PDPage page : pages) {
pageCS = new PDPageContentStream(inputDoc, page, true, false);
pageCS.setNonStrokingColor(Color.white);
// Top rectangle
pageCS.fillRect(0, 0, PAGE_WIDTH, 30);
// Bottom rectangle
pageCS.fillRect(0, PAGE_HEIGHT-30, PAGE_WIDTH, 30);
pageCS.close();
outputDoc.addPage(page);
}
// Save to file
outputFile.delete();
outputDoc.save(outputFile);
// Wait until the end to close all documents, or else you get an error
inputDoc.close();
outputDoc.close();
}
}