0

I have a large set of PDFs that are created in different devices and applications. I just need to know if a PDF is flat/flattened or not. I'd prefere solutions that are implementable using Python or Node.js, but any posix CLI tool would also be helpful.

I would appreciate any suggestions even if it works most of the times.

Update

Since it's asked in the comments about my definition of a flat PDF, I'd add two definitions:

  1. Definition 1: a PDF is flat if it only has one layer.
  2. Definition 2: a PDF is flat if it doesn't have any interactive elements.

Any solution that solves the problem either for definition 1 or 2 is fine.

Reza
  • 1,065
  • 1
  • 10
  • 18
  • flat may every pdf has single page? – Xiaomin Wu Jul 27 '23 at 09:38
  • @KJ. Good question. I'd update my question with the definition I have for flat PDF. – Reza Jul 27 '23 at 12:18
  • That hyperlink thing is most likely not a PDF attribute but rather a feature of the PDF reader (Mac preview makes it a link but Firefox shows a plain text). regarding your second comment I think PDF/A-1b (not PDF-1.4) kinda satisfies the second definition I provided. Having such criteria is "tight" but I don't aim for such tightness cause majority of my PDFs won't pass PDF/A compatibility check anyway. – Reza Jul 27 '23 at 13:49
  • I understand my question and definitions of flat PDF are a bit vague, still I can't help it. I'd welcome anyone who can provide any solution with their own take of a "flat PDF" as long as they explain their assumptions/drawbacks/limitations. – Reza Jul 27 '23 at 13:52

1 Answers1

0

Use PyPDF2 library

import PyPDF2

reader = PyPDF2.PdfReader(file)
has_annotations = any(page.annots for page in reader.pages)

if has_annotations:
  print("pdf is not flattened")

else:
  print("pdf is flattened")
Atom
  • 28
  • 5
  • 1
    thanks this is interesting, I'm gonna play with it a bit. just a correction that I think in latest PyPDF2 it's `page.annotations` since `has_annotations = any(page.annots for page in reader.pages) AttributeError: 'PageObject' object has no attribute 'annots' ` – Reza Jul 27 '23 at 12:35