Extract PDF form field names from a PDF form

Question

I'm using pdftk to fill in a PDF form with an XFDF file. However, for this project I do not know in advance what fields will be present, so I need to analyse the PDF itself to see what fields need to be filled in, present an interface to the user accordingly, and then generate an XFDF file from that to fill in the PDF form.

How do I get the field names? Preferably command-line, .NET or PHP solutions.

Christopher, if you've found a solution I encourage you to post it and mark it as an answer so others may benefit from it in the future. Or you may choose the `delete` link to delete your question. — Ahmad Mageed, Jan 24 '10 at 17:26

score 62 · Answer 1 · answered Sep 16 '10 at 19:19

62

Easy! You are using pdftk already

# pdftk input.pdf dump_data_fields

It will output Field name, field type, some of it's properties (like what are the options for dropdown list or text alignment) and even a Tooltip text (which I found to be extremely useful)

The only thing I'm missing is field coordinates...

answered Sep 16 '10 at 19:19

TEHEK

1,042
9
8

6

This should be the selected answer. Alternatively, if you have Adobe Professional, you can click Forms > Manage Form Data > Export Data to export the data to an FDF file. Then open the FDF file and get the field names associated with the values populated. – Furbeenator Nov 13 '13 at 19:19
Awesome, it helped me very much (lost a day looking for solution) – Epsiloncool Jul 14 '14 at 14:02
where does this command go? Is it available on the Free version of pdftk? – Shiva Naru May 20 '15 at 22:53

score 18 · Answer 2 · answered Sep 22 '15 at 20:28

18

This worked for me:

 pdftk 1.pdf dump_data_fields output test2.txt

Then when the file is encrypted with a password, this is how you can read from it

 pdftk 1.pdf input_pw YOUR_PASSWORD_GOES_HERE dump_data_fields output test2.txt

This took me 2 hours to get right, so hopefully i save you some time :)

answered Sep 22 '15 at 20:28

Dev_Corps

341
3
7

Note that `output test2.txt` is optional. Without it it just prints to stdout. – Nakilon Nov 27 '21 at 10:34

hyiltiz · Answer 3 · 2021-08-13T16:21:11.257

9

Considering pdftk is abandonware, you can use qpdf library to dump the metadata in JSON format, and use jq to filter only the form relevant data:

qpdf inout.pdf --json | jq '.acroform.fields'

qpdf is a lightweight cross-platform FOSS library and jq is a filtering program for JSON (like grep is a filtering program for lines). If you'd rather not use jq or JSON, you can always dump using qpdf input.pdf then look for the metadata yourself in the dump.

edited Aug 13 '21 at 16:21

answered Mar 05 '21 at 02:06

hyiltiz

1,158
14
25

1

This one should be marked as accepted answer nowadays. – rilaby Feb 11 '22 at 10:39

score 1 · Answer 4 · answered Jun 27 '12 at 00:20

A very late answer from me, though my solution is not PHP, but I hope it might come in handy should anyone is looking for a solution for Ruby.

First is to use pdftk to extract all fields name out then we need to cleanup the dump text, to have a good readable hash:

def extract_fields(filename)
  field_output = `pdftk #{filename} dump_data_fields 2>&1`
  @fields = field_output.split(/^---\n/).map do |field_text|
    if field_text =~ /^FieldName: (\w+)$/
      $1
    end
  end.compact.uniq
end

Second, now we can use any XML parse to construct our XFDF:

# code borrowed from `nguyen` gem [https://github.com/joneslee85/nguyen]
# generate XFDF content
def to_xfdf(fields = {}, options = {})
  builder = Nokogiri::XML::Builder.new(:encoding => 'UTF-8') do |xml|
    xml.xfdf('xmlns' => 'http://ns.adobe.com/xfdf/', 'xml:space' => 'preserve') {
      xml.f(:href => options[:file]) if options[:file]
      xml.ids(:original => options[:id], :modified => options[:id]) if options[:id]
      xml.fields {
        fields.each do |field, value|
          xml.field(:name => field) {
            if value.is_a? Array
              value.each { |item| xml.value(item.to_s) }
            else
              xml.value(value.to_s)
            end
          }
        end
      }
    }
  end
  builder.to_xml
end

# write fdf content to path
def save_to(path)
  (File.open(path, 'w') << to_xfdf).close
end

Viola, that's the main logic. I highly recommend you give nguyen (https://github.com/joneslee85/nguyen) gem a try if you are looking for a lightweight lib in Ruby.

score 1 · Answer 5 · answered Apr 30 '10 at 16:36

I used the following code, using ABCpdf from WebSupergoo, but I imagine most libraries have comparable classes:

protected void Button1_Click(object sender, EventArgs e)
    {
        Doc thedoc = new Doc();
        string saveFile = "~/docs/f1_filled.pdf";
        System.Text.StringBuilder sb = new System.Text.StringBuilder();
        thedoc.Read(Server.MapPath("~/docs/F1_2010.pdf"));
        foreach (Field fld in thedoc.Form.Fields)
        {
            if (!(fld.Page == null))
            {
                sb.AppendFormat("Field: {0}, Type: {1},page: {4},x: {2},y: {3}\n", fld.Name, fld.FieldType.ToString(), fld.Rect.Left, fld.Rect.Top, fld.Page.PageNumber);
            }
            else
            {
                sb.AppendFormat("Field: {0}, Type: {1},page: {4},x: {2},y: {3}\n", fld.Name, fld.FieldType.ToString(), fld.Rect.Left, fld.Rect.Top, "None");
            }
            if (fld.FieldType == FieldType.Text)
            {
                fld.Value = fld.Name;
            }

        }

        this.TextBox1.Text = sb.ToString();
        this.TextBox1.Visible = true;
        thedoc.Save(Server.MapPath(saveFile));
        Response.Redirect(saveFile);
    }

This does 2 things: 1) Populates a textbox with the inventory of all Form Fields, showing their name, fieldtype, and their page number and position on the page (0,0 is lower left, by the way). 2) Populates all the textfields with their field name in an output file - print the output file, and all of your text fields will be labelled.

score 0 · Answer 6 · answered Sep 27 '16 at 09:33

C# / ITextSharp

    public static void TracePdfFields(string pdfFilePath)
    {
        PdfReader pdfReader = new PdfReader(pdfFilePath);
        MemoryStream pdfStream = new MemoryStream();
        PdfStamper pdfStamper = new PdfStamper(pdfReader, pdfStream, '\0', true);

        int i = 1;
        foreach (var f in pdfStamper.AcroFields.Fields)
        {
            pdfStamper.AcroFields.SetField(f.Key, string.Format("{0} : {1}", i, f.Key));
            i++;
            //DoTrace("Field = [{0}] | Value = [{1}]", f.Key, f.Value.ToString());
        }
        pdfStamper.FormFlattening = false;
        pdfStamper.Writer.CloseStream = false;
        pdfStamper.Close();

        FileStream fs = File.OpenWrite(string.Format(@"{0}/{1}-TracePdfFields_{2}.pdf", 
            ConfigManager.GetInstance().LogConfig.Dir, 
            new FileInfo(pdfFilePath).Name, 
            DateTime.Now.Ticks));

        fs.Write(pdfStream.ToArray(), 0, (int)pdfStream.Length);
        fs.Flush();
        fs.Close();
    }

score -4 · Accepted Answer · answered Jan 24 '10 at 17:41

-4

I can get my client to export the XFDF file (which contains field names) using Acrobat along with the PDF, which avoids this problem completely.

answered Jan 24 '10 at 17:41

Christopher Done

5,886
4
35
38

2

Do you mean Acrobat Reader or some related Acrobat product? – Derek Mahar Mar 29 '19 at 17:08
@christopher-done Please tell the name of your client, and how to generate the XFDF File – Asturio Jun 18 '19 at 10:13

Extract PDF form field names from a PDF form

7 Answers7

Linked