0

We have a program that processes PDF documents - Automated. We fail with certain PDFs because they are malformed . When we open the PDFs in acrobat, it opens it. I see that Acrobat goes to extra measures to fix the malformed PDFs. So in our case, someone manually has to open and save them to make them clean. Is there a way I can programmatically do this in Python or Powershell? Has anyone done this?

Thanks!

Rainmaker
  • 321
  • 3
  • 15
  • Manual supervision would still be needed, and in this case, you could simply set up an Action (aka batch process) within Acrobat. Things would be simpler if the problematic files can be collected in a single folder. – Max Wyss Sep 02 '17 at 08:36
  • They do go to an "error" folder. Is there a way I can run a python or powershell script an hourly basis where I can programmatically do the "open and save" by acrobat. – Rainmaker Sep 02 '17 at 20:48
  • You would need a PDF SDK that offers a similar set of repair features that Adobe Acrobat offers. There are at least a few of them out there that can perform the same fix ups that can be called from Python or Powershell with a little programming effort and some money for a license. There might be free/open source solutions as well. – Brandon Haugen Sep 05 '17 at 13:22

1 Answers1

0

You might try this this link.

You can run a macro from powershell. You can also set up a scheduled task to run your powershell script in task scheduler at pretty much any interval you like (TASKSCHD.MSC) This particular example has a msgbox for the path to folder but it loops through all pdf files in a folder, flattens and saves. Perhaps flattening might not be required but might help with a malformed PDF.

** This relies on Acrobat and uses the javascript API through the excel ... I'm not sure if libreoffice draw has has a javascript api like acrobat. I'm not aware of any open source alternatives that have that sort of functionality. If anyone is please let me know.