2

I'm using PDFBox's PDPage::convertToImage to display PDF pages in Java. I'm trying to create click-able areas on the PDF page's image based on COSObjects in the page (namely, AcroForm fields). The problem is the PDF seems to use a completely different coordinate system:

System.out.println(field.getDictionary().getItem(COSName.RECT));

yields

COSArray{[COSFloat{149.04}, COSFloat{678.24}, COSInt{252}, COSFloat{697.68}]}

If I were to estimate the actual dimensions of the field's rectangle on the image, it would be 40,40,50,10 (x,y,width,height). There's no obvious correlation between the two and I can't seem to find any information about this with Google.

How can I determine the pixel position of a PDPage's COSObjects?

roundar
  • 1,593
  • 1
  • 18
  • 22

1 Answers1

6

The pdf coordinate system is not that different from the coordinate system used in images. The only differences are:

  • the y-axis points up, not down
  • the scale is most likely different.

You can convert from pdf coordinates to image coordinates using these formulae:

x_image = x_pdf * width_image / width_page
y_image = (height_pdf - y_pdf) * height_image / height_pdf

To get the page size, simply use the mediabox size of the page that contains the annotation:

PDRectangle pageBounds = page.getMediaBox();

You may have missed the correlation between the array from the pdf and your image coordinate estimates, since a rectangle in pdf is represented as array [x_left, y_bottom, x_right, y_top].

Fortunately PDFBox provides classes that operate on a higher level than the cos structure. Use this to your advantage and use e.g. PDRectangle you get from the PDAnnotation using getRectangle() instead of accessing the COSArray you extract from the field's dictionary.

fabian
  • 80,457
  • 12
  • 86
  • 114
  • Thank you! I don't see a clear way to get a field's corresponding annotation. Is the best way ([derived from this answer](http://stackoverflow.com/a/22132921/1907998)) to iterate through all fields and annotations, mapping them by their dictionary? Also, according to that answer it seems there can be a field without an annotation. Is it better to work with the low level `COSObject` to avoid these problems? – roundar Aug 21 '15 at 02:09
  • I've also noticed that if I get the field's width/height with `int width = x_right - x_left;`/`int height = y_top - y_bottom;` I get exactly half of the width and height respectively. – roundar Aug 21 '15 at 06:00
  • @roundar: The visual representation of fields are the annotation widgets, which you can access from the pages' annotation lists. There may be other annotations, but you can check, if the Subtype is `Widget`. Also `PDField`s have a `getWidget()` method, but I don't recommend using it unless you're sure what the field type is, since it will try to locate the widget, even for fields without visual representation, which would require you to do a tree traversal through the fields. PS: does your last comment mean my formulae don't work or was that just a observation? – fabian Aug 21 '15 at 07:16
  • Excellent, thank you. The formulas do work, I just observed that when comparing the differences in the respective x and y components of the `COSArray` (or `PDRectangle`) that for some reason that I can't see intuitively, you have to multiply it by two to get the accurate width and height of the actual field. If you know why that is, it may be worth adding to your answer for posterity. – roundar Aug 21 '15 at 15:01