12

I'd like to create/modify the title of a pdf document using pypdf. It seems that the title is readonly. Is there a way to access this metadata r/w?

If answer positive, a piece of code would be appreciated.

Thanks

MPelletier
  • 16,256
  • 15
  • 86
  • 137
Baudouin Tamines
  • 121
  • 1
  • 1
  • 4

1 Answers1

9

You can manipulate the title with pyPDF (sort of). I came across this post on the reportlab-users listing:

http://two.pairlist.net/pipermail/reportlab-users/2009-November/009033.html

You can also use pypdf. https://pypi.org/project/pypdf/

This won't let you edit the metadata per se, but will let you read one or more pdf file(s) and spit them back out, possibly with new metadata.

Here's the relevant code:

from pyPdf import PdfFileWriter, PdfFileReader
from pyPdf.generic import NameObject, createStringObject

OUTPUT = 'output.pdf'
INPUTS = ['test1.pdf', 'test2.pdf', 'test3.pdf']

# There is no interface through pyPDF with which to set this other then getting
# your hands dirty like so:
infoDict = output._info.getObject()
infoDict.update({
    NameObject('/Title'): createStringObject(u'title'),
    NameObject('/Author'): createStringObject(u'author'),
    NameObject('/Subject'): createStringObject(u'subject'),
    NameObject('/Creator'): createStringObject(u'a script')
})

inputs = [PdfFileReader(i) for i in INPUTS]
for input in inputs:
    for page in range(input.getNumPages()):
        output.addPage(input.getPage(page))

outputStream = file(OUTPUT, 'wb')
output.write(outputStream)
outputStream.close()
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
Mark Lavin
  • 24,664
  • 5
  • 76
  • 70
  • When constructing a PdfFileReader, you need to pass a file-like object, not a string/filename (at least with pyPdf 1.13) – Joe Germuska Nov 18 '13 at 16:24
  • 6
    [PyPDF2](http://mstamy2.github.io/PyPDF2/) (which seems to have replaced pyPDF) has a native method that does this for you: `output.addMetadata({'/Title': 'title'})` – gellej Jul 01 '14 at 15:49