This question essentially is about a dense merging of multiple PDF pages from one or more PDFs.
Usually merge methods for pdfs only merge on a page-basis, i.e. they take the pages from the documents to merge and create a new document with all those pages. Often a more dense merge (putting the contents of multiple pages on a single result page) is not feasible due to headers, footers, background graphics and other artifacts which would have to be recognized and ignored in this context. For pages like yours a dense merge is feasible, merely not provided as a single utility method yet.
One can implement such a utility class like this:
public class PdfDenseMergeTool {
public PdfDenseMergeTool(PDRectangle size, float top, float bottom, float gap)
{
this.pageSize = size;
this.topMargin = top;
this.bottomMargin = bottom;
this.gap = gap;
}
public void merge(OutputStream outputStream, Iterable<PDDocument> inputs) throws IOException
{
try
{
openDocument();
for (PDDocument input: inputs)
{
merge(input);
}
if (currentContents != null) {
currentContents.close();
currentContents = null;
}
document.save(outputStream);
}
finally
{
closeDocument();
}
}
void openDocument() throws IOException
{
document = new PDDocument();
newPage();
}
void closeDocument() throws IOException
{
try
{
if (currentContents != null) {
currentContents.close();
currentContents = null;
}
document.close();
}
finally
{
this.document = null;
this.yPosition = 0;
}
}
void newPage() throws IOException
{
if (currentContents != null) {
currentContents.close();
currentContents = null;
}
currentPage = new PDPage(pageSize);
document.addPage(currentPage);
yPosition = pageSize.getUpperRightY() - topMargin + gap;
currentContents = new PDPageContentStream(document, currentPage);
}
void merge(PDDocument input) throws IOException
{
for (PDPage page : input.getPages())
{
merge(input, page);
}
}
void merge(PDDocument sourceDoc, PDPage page) throws IOException
{
PDRectangle pageSizeToImport = page.getCropBox();
BoundingBoxFinder boundingBoxFinder = new BoundingBoxFinder(page);
boundingBoxFinder.processPage(page);
Rectangle2D boundingBoxToImport = boundingBoxFinder.getBoundingBox();
double heightToImport = boundingBoxToImport.getHeight();
float maxHeight = pageSize.getHeight() - topMargin - bottomMargin;
if (heightToImport > maxHeight)
{
throw new IllegalArgumentException(String.format("Page %s content too large; height: %s, limit: %s.", page, heightToImport, maxHeight));
}
if (gap + heightToImport > yPosition - (pageSize.getLowerLeftY() + bottomMargin))
{
newPage();
}
yPosition -= heightToImport + gap;
LayerUtility layerUtility = new LayerUtility(document);
PDFormXObject form = layerUtility.importPageAsForm(sourceDoc, page);
currentContents.saveGraphicsState();
Matrix matrix = Matrix.getTranslateInstance(0, (float)(yPosition - (boundingBoxToImport.getMinY() - pageSizeToImport.getLowerLeftY())));
currentContents.transform(matrix);
currentContents.drawForm(form);
currentContents.restoreGraphicsState();
}
PDDocument document = null;
PDPage currentPage = null;
PDPageContentStream currentContents = null;
float yPosition = 0;
final PDRectangle pageSize;
final float topMargin;
final float bottomMargin;
final float gap;
}
(PdfDenseMergeTool utility class)
It uses the BoundingBoxFinder
class from this answer to an older question.
You can use the PdfDenseMergeTool
like this:
PDDocument document1 = ...;
PDDocument document2 = ...;
PDDocument document3 = ...;
PDDocument document4 = ...;
PDDocument document5 = ...;
PdfDenseMergeTool tool = new PdfDenseMergeTool(PDRectangle.A4, 30, 30, 10);
tool.merge(new FileOutputStream("Merge with Text.pdf"),
Arrays.asList(document1, document2, document3, document4, document5,
document1, document2, document3, document4, document5,
document1, document2, document3, document4, document5));
To merge the five source documents three times in a row.
In case of my test documents (each source documents containing three lines of text) I get this result:
Page 1:

Page 2:

This utility class essentially is a port of the PdfDenseMergeTool
for iText in this answer.
It has been tested with the current PDFBox 3.0.0 development branch SNAPSHOT.