Change metadata of pdf file with pypdf2

Question

I want to add a metadata key-value pair to the metadata of a pdf file.

I found a several years old answer, but I think this is way to complicated. I guess there is an easier way today: https://stackoverflow.com/a/3257340/633961

I am not married with pypdf2, if there is an easier way, then I go this way?

score 25 · Answer 1 · edited Apr 15 '22 at 10:52

25

I was surprised to see there is no code sample for PyPDF2 when the questions is explicitly asking for PyPDF2, so here it is:

from PyPDF2 import PdfFileReader, PdfFileWriter

reader = PdfFileReader("source.pdf")
writer = PdfFileWriter()

writer.appendPagesFromReader(reader)
metadata = reader.getDocumentInfo()
writer.addMetadata(metadata)

# Write your custom metadata here:
writer.addMetadata({"/Some": "Example"})

with open("result.pdf", "wb") as fp:
    writer.write(fp)

edited Apr 15 '22 at 10:52

Martin Thoma

124,992
159
614
958

answered Mar 01 '18 at 15:54

Cyril N.

38,875
36
142
243

2

This doesn't work for me. `appendPageFromReader` just adds the correct number of Blank pages. – benwiggy Apr 21 '19 at 16:38
Doesn't work for me. The metadata is transferred correctly to `writer` but `Some: Example` isn't added. – Tom Russell Mar 24 '23 at 04:37

score 13 · Accepted Answer · edited Oct 24 '17 at 13:02

13

You can do that using pdfrw

pip install pdfrw

Then run

from pdfrw import PdfReader, PdfWriter   
trailer = PdfReader("myfile.pdf")    
trailer.Info.WhoAmI = "Tarun Lalwani"    
PdfWriter("edited.pdf", trailer=trailer).write()

And then check the PDF Custom Properties

edited Oct 24 '17 at 13:02

guettli

25,042
81
346
663

answered Oct 24 '17 at 12:36

Tarun Lalwani

142,312
9
204
265

Yes, it worked. In my case I needed to add a key which is not a valid python name, but it worked like this: `setattr(reader.Info, 'original-files', value)`. Thank you – guettli Oct 30 '17 at 09:51
Apparently using `pdfrw` results in the metadata being inaccessible using std methods. I don't have an Apple device to check custom metadata. – Tom Russell Mar 24 '23 at 06:30

score 9 · Answer 3 · edited Jun 20 '20 at 09:12

The correct way to edit PDF metadata in Python

There are several ways to edit PDF metadata in Python, but one way is better than the others.

I will start by talking about other ways that seem right but have side effects. Skip to the end of this article if you don’t have enough time and just use the correct way.

Weakness is package not maintained.

from pdfrw import PdfReader, PdfWriter, PdfDict

if __name__ == '__main__':
    pdf_reader = PdfReader('old.pdf')
    metadata = PdfDict(Author='Someone', Title='PDF in Python')
    pdf_reader.Info.update(metadata)
    PdfWriter().write('new.pdf', pdf_reader)

pdfrw can do quite easily without losing non-display information such as bookmarks.

PyPDF2 supports more PDF features than pdfrw, including decryption and more types of decompression.

Weakness is PDF not preserve outlines(bookmarks).

import pprint

from PyPDF2 import PdfFileReader, PdfFileWriter

if __name__ == '__main__':
    file_in = open('old.pdf', 'rb')
    pdf_reader = PdfFileReader(file_in)
    metadata = pdf_reader.getDocumentInfo()
    pprint.pprint(metadata)

    pdf_writer = PdfFileWriter()
    pdf_writer.appendPagesFromReader(pdf_reader)
    pdf_writer.addMetadata({
        '/Author': 'Someone',
        '/Title': 'PDF in Python'
    })
    file_out = open('new.pdf', 'wb')
    pdf_writer.write(file_out)

    file_in.close()
    file_out.close()

Using PdfFileWriter create a new PDF, and get old contents through appendPagesFromReader(), then addMetadata().

It seems that we cannot directly modify the PDF metadata, so we add all pages and metadata then write out to a new file.

The correct way to edit PDF metadata in Python.

import pprint

from PyPDF2 import PdfFileReader, PdfFileMerger

if __name__ == '__main__':
    file_in = open('old.pdf', 'rb')
    pdf_reader = PdfFileReader(file_in)
    metadata = pdf_reader.getDocumentInfo()
    pprint.pprint(metadata)

    pdf_merger = PdfFileMerger()
    pdf_merger.append(file_in)
    pdf_merger.addMetadata({
        '/Author': 'Someone',
        '/Title': 'PDF in Python'
    })
    file_out = open('new.pdf', 'wb')
    pdf_merger.write(file_out)

    file_in.close()
    file_out.close()

Using PdfFileMerger concatenate pages through append().

append(fileobj, bookmark=None, pages=None, import_bookmarks=True)

import_bookmarks (bool) – You may prevent the source document’s bookmarks from being imported by specifying this as False.

References

pdfrw: the other Python PDF library
Reading and writing pdf metadata

When using pypdf2, my bookmarks becomes offset, and my toc looses all links. — Zug_Bug, Sep 14 '20 at 17:21
I tested that correct way with a number of PDFs of different making (Acrobat, Libre etc.): In not one case did the output file contain the pdf's body content. It was empty. But even worse, the pdf_merger section destroyed the input file, leaving it as a 0 kb zombie. — Helen, Feb 04 '21 at 19:05
I tried all several (or all) of these methods and found that the first (`pdfrw`) to be the one that works. It's fast and it even allows for the creation of custom metadata. You might need to watch the capitalization of `key` values. — Tom Russell, Apr 13 '23 at 04:55

score 7 · Answer 4 · edited Apr 15 '22 at 10:50

Building on what Cyril N. stated, the code works fine, but it creates a lot of "trash" files since now you have the original file and the file with the metadata.

I changed the code a bit since I will run this on hundreds of files a day, and don't want to deal with the additional clean-up:

from PyPDF2 import PdfFileReader, PdfFileWriter

reader = PdfFileReader("your_original.pdf")
writer = PdfFileWriter()

writer.appendPagesFromReader(reader)
metadata = reader.getDocumentInfo()
writer.addMetadata(metadata)

# Write your custom metadata here:
writer.addMetadata({"/Title": "this"})

with open("your_original.pdf", "ab") as fout:
    # ab is append binary; if you do wb, the file will append blank pages
    writer.write(fout)

If you do want to have it as a new file, just use a different name for the pdf in fout and keep ab. If you use wb, you will append blank pages equal to your original file.

Change metadata of pdf file with pypdf2

4 Answers4

The correct way to edit PDF metadata in Python

References

Linked