1

I'm using /usr/bin/pdftk filename.pdf dump_data_fields output - flatten to get the FDF fields in a PDF but it seems to be including invisible FDF fields as well.

https://docdro.id/nriB59b is a one-page PDF without any txt but with a number of these invisible FDF fields. pdftk's output can be seen at https://pastebin.com/ag6vweNP.

How can I exclude invisible FDF fields?

I'm currently using pdftk but I'm open to using other tools as well.

Thanks!

neubert
  • 15,947
  • 24
  • 120
  • 212
  • @Amessihel - done. – neubert Sep 20 '19 at 21:23
  • @Amessihel - the PDF was actually sent to me by a coworker of mine to look into. They were adding FDF fields to it but saw that `dump_data_fields` was returning fields that they weren't seeing in the PDF. Upon receipt of the PDF I deleted all but one page with Adobe Acrobat Pro and then selected the content of the lone page and deleted it. I then saved the PDF and was able to reproduce the issue. It's possible the original PDF had these hidden fields as well idk – neubert Sep 22 '19 at 14:28
  • What do you mean by *fdf fields*? If a pdf contains fields, it contains acroform fields or xfa fields. – mkl Sep 23 '19 at 19:01
  • 1
    Those fields are regular PDF fields with merged widget annotations, even including appearance streams, they merely are not referenced from any page of your document. They are neither invisible by flag (*Invisible*, *Hidden*, and *NoView* are all not set) nor by size (**Rect** is not 0x0). – mkl Sep 24 '19 at 09:30
  • [Maybe another way to achieve your goal if you're into code.](https://stackoverflow.com/q/56162692/4375327) – Amessihel Sep 24 '19 at 16:23

1 Answers1

1

My guess is you have to inspect the PDF yourself to detect if or not a field is invisible. In another side, it may become very tricky to tell if a field is invisible or not, except if a flag sets this.

For example, although I don't know if it's possible, but let say a field is outside the page or covered by another content... Is it visible or not?

By the way, you can use qpdf to inspect the content of a PDF file. The following command will decompress your pdf to get it human readable.

qpdf --qdf --object-streams=disable orig.pdf uncompressed-qpdf.pdf

If you prefer a JSON representation:

qpdf --json your_pdf.pdf > your_pdf.json

If you go for the later one, you can parse the json output with jq.

Then, use the PDF speficication you want to apply. I suggest also these steps:

  • you produce a pdf with a given field visible
  • another copy of this pdf but with the field hidden
  • uncompress both of them and then compare them with diff.
Amessihel
  • 5,891
  • 3
  • 16
  • 40
  • I gather by *invisible* the op means does not wonder whether they are covered by anything, merely whether they are not visible due to certain field or annotation flags or to 0 sized rectangles. – mkl Sep 23 '19 at 19:04
  • @mkl, it would be great, I asked because after some - beginer level - inspection I was unable to figure out how these fields were invisible. – Amessihel Sep 23 '19 at 19:12
  • They merely are not referenced from the page of the document. Other than that they are not marked invisible in any way. – mkl Sep 24 '19 at 09:34
  • That is the abstract **AcroForm** definition. Yes, the fields are defined in the PDF and they are collected in the **AcroForm** dictionary, they merely are not attached to a specific document page. – mkl Sep 24 '19 at 17:03