5

I'm working on a python script that processes PDF files, though some of them contain encryption that restricts usage to only printing, which I have to manually remove before I can process them.

For that I have been manually using QPDF to remove these restrictions on individual PDF files before running the script (the commands for qpdf are pretty simple...inside the command prompt -> qpdf --decrypt input.pdf output.pdf)

My question is - rather than doing this bit manually, is it possible to execute the QPDF executable file within my Python script and run the command? I haven't been able to find any python modules specifically to control QPDF so I am not holding much hope.

Matt
  • 972
  • 3
  • 11
  • 22
  • 1
    use [subprocess](https://docs.python.org/3/library/subprocess.html) module to run any external program. ie. `subprocess.run(["qpdf", "--decrypt", "input.pdf", "output.pdf"]) ` – furas Nov 15 '16 at 21:02
  • Possible duplicate of [Calling an external command in Python](http://stackoverflow.com/questions/89228/calling-an-external-command-in-python) – mmmmmm Nov 15 '16 at 22:58
  • 3
    Python wrapper for qpdf: https://github.com/pikepdf/pikepdf – balki Dec 20 '18 at 19:12

2 Answers2

7

Thanks to furas for pointing me in the right direction.

This is how I did it in Windows 10:

  1. Download QPDF, extract the folder and save somewhere on your PC. I put the folder in C:\qpdf-5.1.2. Inside the folder is bin\qpdf.exe.
  2. Set an environment variable to C:\qpdf-5.1.2\bin. To set an environment variable in Windows 10, go to System Properties > Advanced > Environment Variables. With PATH highlighted, click Edit, then click New and paste in the path to the directory in point 2.

Once that is set up, you can reference 'qpdf' in the command prompt and in Python.

import subprocess
subprocess.run(["qpdf", "--decrypt", "C:/qpdf-5.1.2/bin/input.pdf", "C:/qpdf-5.1.2/bin/output.pdf"])
Community
  • 1
  • 1
Matt
  • 972
  • 3
  • 11
  • 22
1

Use pikepdf lib which is based on QPDF and referenced in the QPDF manual.

pip install pikepdf (pip or pip3 depending on your system's defaults)

import pikepdf

with pikepdf.Pdf.open('input.pdf', password='passwd') as pdf:
    pdf.save('output.pdf')

If the password is just a blank string, can omit the password param, it'll still save the output pdf file as a no-pw thing.

Nikhil VJ
  • 5,630
  • 7
  • 34
  • 55
  • 1
    Thanks for sharing. When I asked the question in 2016 I don't think pikepdf existed. I have changed this to the accepted answer for anybody searching for this problem – Matt Dec 20 '21 at 18:03