0

I have a bunch of p7m files (used to digitally sign some files, usually pdf files) and I would like some help to find a way to extract the content. I know how to iterate a process over the files in a folder using Python, I need help just with the extraction part.

I tried with PyPDF2.PdfFileReader.decrypt() but I get a "EOF marker not found" error because apparently PyPDF2 cannot manage encrypted files. I saw somebody used the mime library, but that is way above my level honestly.

Thank you

Vastem
  • 11
  • 1
  • 1
  • 6
  • Why not use PyMuPDF: it supports all PDF encryption levels, including AES-256 - which pypdf2 does not. Checkout this website to join the pymupdf discussion channels: https://github.com/pymupdf/PyMuPDF/discussions/1766 – Jorj McKie Jan 18 '23 at 11:11
  • First of all, thanks! I checked the Documentation, for encrypted pdf I found only a "authenticate(password)" method which requires a password. Such p7m files I'm working with do not have any password, as far as I know. – Vastem Jan 18 '23 at 11:28

1 Answers1

0

If you are interested only in the content of the file (and not in the digital signature) you can simply use PyPDF2 in the standard way.
In my case this worked to extract the first page:

from PyPDF2 import PdfReader
file = PdfReader(path + "\File_name.pdf.p7m")
page1 = file.pages[1].extract_text()
a-caputo
  • 13
  • 4
  • At the end it worked for me using the code below (found goolging around, I do not rememmber where) which extract any files: ' with open(name_of_p7m_file, 'rb') as f: p7data = f.read() p7 = crypto.load_pkcs7_data(crypto.FILETYPE_ASN1, p7data) bio_out =crypto._new_mem_buf() res = _lib.PKCS7_verify(p7._pkcs7, _ffi.NULL, _ffi.NULL, _ffi.NULL, bio_out, _lib.PKCS7_NOVERIFY|_lib.PKCS7_NOSIGS) if res == 1: databytes = crypto._bio_to_string(bio_out) with open(name[:-4], "wb") as binary_file: binary_file.write(databytes) f.close() ' – Vastem Apr 19 '23 at 18:18