I have a successfully implemented a pdf merge solution using PDFBox using InputStreams
. However, when I try to merge a document that is of a very large size I receive the following error:
Caused by: java.io.IOException: Missing root object specification in trailer.
at org.apache.pdfbox.pdfparser.COSParser.parseTrailerValuesDynamically(COSParser.java:2832) ~[pdfbox-2.0.11.jar:2.0.11]
at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:173) ~[pdfbox-2.0.11.jar:2.0.11]
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:220) ~[pdfbox-2.0.11.jar:2.0.11]
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1144) ~[pdfbox-2.0.11.jar:2.0.11]
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1060) ~[pdfbox-2.0.11.jar:2.0.11]
at org.apache.pdfbox.multipdf.PDFMergerUtility.legacyMergeDocuments(PDFMergerUtility.java:379) ~[pdfbox-2.0.11.jar:2.0.11]
at org.apache.pdfbox.multipdf.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:280) ~[pdfbox-2.0.11.jar:2.0.11]
Of more importance (I think) are these statements that occur just before the error:
FINE (pdfparser.COSParser) [] - Missing end of file marker '%%EOF'
FINE (pdfparser.COSParser) [] - Set missing offset 388 for object 2 0 R
It seems to me that it can't find the '%%EOF'
marker in very large files. Now I know that it is indeed there because I can look at the source (unfortunately I can't provide the file itself).
Doing some searching online I found that there is a setEOFLookupRange()
method on the COSParser
class. I'm wondering if perhaps the lookup range is too small and that is why it can't find the '%%EOF'
marker. The problem is...I'm not using the COSParser
object at all in my code; I'm only using the PDFMergerUtility
class. The PDFMergerUtility
seems to be using the COSParser
under the hood.
So my questions are
- Is my hypothesis about the
EOFLookupRange
correct? - If so, how can I set that range only having the
PDFMergerUtility
in my code and not theCOSParser
object?
Many thanks for your time!
UPDATED with code below
private boolean getCoolDocuments(final String slateId, final String filePathAndName)
throws IOException {
boolean status = false;
InputStream pdfStream = null;
HttpURLConnection connection = null;
final PDFMergerUtility merger = new PDFMergerUtility();
final ByteArrayOutputStream mergedPdfOutputStream = new ByteArrayOutputStream();
try {
final List<SlateDocument> parsedSlateDocuments = this.getSpecificDocumentsFromSlate(slateId);
if (!parsedSlateDocuments.isEmpty()) {
// iterate through each document, adding each pdf stream to the merger utility
int numberOfDocuments = 0;
for (final SlateDocument slateDocument : parsedSlateDocuments) {
final String url = this.getBaseURL() + "/slate/" + slateId + "/documents/"
+ slateDocument.getDocumentId();
/* code for RequestResponseUtil.initializeRequest(...) below */
connection = RequestResponseUtil.initializeRequest(url, "GET", this.getAuthenticationHeader(),
true, MediaType.APPLICATION_PDF_VALUE);
if (RequestResponseUtil.isSuccessful(connection.getResponseCode())) {
pdfStream = connection.getInputStream();
}
else {
/* do various things */
}
merger.addSource(pdfStream);
numberOfDocuments++;
}
merger.setDestinationStream(mergedPdfOutputStream);
// merge the all the pdf streams together
merger.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
status = true;
}
else {
LOG.severe("An error occurred while parsing the slated documents; no documents remain after parsing!");
}
}
finally {
RequestResponseUtil.close(pdfStream);
this.disconnect(connection);
}
return status;
}
public static HttpURLConnection initializeRequest(final String url, final String method,
final String httpAuthHeader, final boolean multiPartFormData, final String reponseType) {
HttpURLConnection conn = null;
try {
conn = (HttpURLConnection) new URL(url).openConnection();
conn.setRequestMethod(method);
conn.setRequestProperty("X-Slater-Authentication", httpAuthHeader);
conn.setRequestProperty("Accept", reponseType);
if (multiPartFormData) {
conn.setRequestProperty("Content-Type", "multipart/form-data; boundary=BOUNDARY");
conn.setDoOutput(true);
}
else {
conn.setRequestProperty("Content-Type", "application/xml");
}
}
catch (final MalformedURLException e) {
throw new CustomException(e);
}
catch (final IOException e) {
throw new CustomException(e);
}
return conn;
}