1

How can I change metadata fields, CreationDate and ModificationDate, when I create a pdf with Reportlab?

djvg
  • 11,722
  • 5
  • 72
  • 103
Diamantino S
  • 73
  • 1
  • 1
  • 7

2 Answers2

1

Take a look at where modification and creation dates are set:

D['ModDate'] = D["CreationDate"] = \
             Date(ts=document._timeStamp,dateFormatter=self._dateFormatter)
# ...
return PDFDictionary(D).format(document)

Basically, metadata is a dictionary saved at the end of binary string, start of string is file contents (document).

Inside Reportlab the workflow you ask about can be:

  • create canvas
  • draw something on it
  • get document from canvas
  • create PDFDictionary with artificial mod and create dates
  • format document with PDFDictionary
  • save to file

Change metadata of pdf file with pypdf also attempts similar goal.

Evgeny
  • 4,173
  • 2
  • 19
  • 39
0

The ReportLab (currently 3.5) Canvas provides public methods, like Canvas.setAuthor(), to set the /Author, /Title, and other metadata fields (called "Internal File Annotations" in the docs, section 4.5).

However, there is no method for overriding the /CreationDate or /ModDate.

If you only need to change the formatting of the dates, you can simply use the Canvas.setDateFormatter() method.

The methods described above modify a PDFInfo object, as can be seen in the source, but this is part of a private PDFDocument (as in Canvas._doc.info).

If you really do need to override the dates, you could either hack into the private parts of the canvas, or just search the content of the resulting file object for /CreationDate (...) and /ModDate (...), and replace the value between brackets.

Here's a quick-and-dirty example that does just that:

import io
import re
from reportlab.pdfgen import canvas

# write a pdf in a file-like object
file_like_obj = io.BytesIO()
p = canvas.Canvas(file_like_obj)
# set some metadata
p.setAuthor('djvg')
# ... add some content here ...
p.save()

# replace the /CreationDate (similar for /ModDate )
pdf_bytes = file_like_obj.getvalue()
pdf_bytes = re.sub(b'/CreationDate (\w*)', b'/CreationDate (D:19700101010203+01)', pdf_bytes)

# write to actual file
with open('test.pdf', 'wb') as pdf:
    pdf.write(pdf_bytes)

The example above just illustrates the principle. Obviously one could use fancy regular expressions with lookaround etc.

From the pdf spec:

Date values used in a PDF shall conform to a standard date format, which closely follows that of the international standard ASN.1 (Abstract Syntax Notation One), defined in ISO/IEC 8824. A date shall be a text string of the form

( D : YYYYMMDDHHmmSSOHH' mm )

djvg
  • 11,722
  • 5
  • 72
  • 103