pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.
A Pure-Python library built as a PDF toolkit. It is capable of:
- extracting document information (title, author, ...),
- splitting documents page by page,
- merging documents page by page,
- cropping pages,
- merging multiple pages into a single page,
- encrypting and decrypting PDF files.
By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.
pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.
Relationship to PyPDF2
PyPDF2 was a fork of pyPdf.
PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.
pypdf==3.1.0
is essentially the same as PyPDF2==3.0.0
. Just the package name was changed to pypdf
.
See: https://pypdf.readthedocs.io/en/latest/meta/history.html