I've been trying to re-implement the concatenate example from OpenPDF 1.2.4 and 1.2.11 in Scala:
def mergePdfs(docs: Seq[Array[Byte]]): Array[Byte] = {
log.debug(s"merging ${docs.size} PDFs")
val output = new ByteArrayOutputStream()
val document = new Document()
val copy = new PdfCopy(document, output)
getPageSize(docs.headOption) foreach document.setPageSize
document.open()
docs foreach { doc =>
val reader = new PdfReader(doc)
1 to reader.getNumberOfPages foreach { pageNum =>
copy.addPage(copy.getImportedPage(reader, pageNum))
}
}
document.close()
output.toByteArray
}
Here Here is an example output document. I generated it from two copies of this and then three copies of this.
I am seeing two issues:
- Document is corrupt (only opens in FireFox), partly due to a line of cruft immediately between the header and the first object. Deleting the offending line does not fix the document error in client code, thanks @mkl!
- Some pages (usually one but it's non-deterministic) appear blank. No pattern I've seen in which. Additionally, each page's text appears twice in the file. e.g. in the example above:
$ strings out.pdf | grep "A Simple PDF File" | wc -l | tr -d ' '
6
In one case I used vim to delete the first content stream and that caused the text to appear on the first page.
Am I misusing the API in some way?