I'm receiving misnamed documents with a .pdf extension, and I know for a fact that these are supposed to be at least readable as pdfs. Some of these are .p7m files from which I have to obtain the contained document: Apache Tika infers their mime-type as either application/x-dbf
or application/pkcs7-signature
, and a standard application for p7m processing can easily extract the contained pdf. These documents have to be subsequently processed automatically through an OCR.
I have tried to extract the content in two ways:
- using BC's CMSSignedData (source: both a colleague and SO);
- using
openssl smime -decrypt -verify -inform DER -in path/to/infile.p7m -noverify -out path/to/outfile.pdf
in the command line (source).
A pdf extracted with the former halts in the subsequent automatic processing (although the document is readable with a PDF viewer), while the same pdf extracted with the latter works perfectly fine (it's viewable, and the OCR works on it). Therefore I need to translate the command line script in Java.
I am not familiar with encryption and only understand the very surface level. From my bare bones understanding it seems I should use CMSEnvelopedData, but the example code in the javadoc requires a private key which I don't have. I have tried searching for "extract DER encrypted no private key" and the likes to no real result (at least not one I could understand).