0

I have some PDF documents in which their main content is Vector Graphics (bitmap graphics). Like the following.

IMPORTANT NOTE: These are the only type of operators in the PDF. It does not contain text, images or other type of objects. (I reviewed all the content using PDFBox debugger).

q
  0.75 0 0 -0.75 36.12 573.96 cm
  0 0 0 rg
  0 0 m
  2.24 0 l
  2.24 5.92 l
  3.04 5.92 l
  3.04 0 l
  5.28 0 l
  5.28 -0.8 l
  0 -0.8 l
  0 0 l
  h
  f
Q
q
  0.75 0 0 -0.75 43.800003 572.04 cm
  0 0 0 rg
  0 0 m
  0 -1.44 -0.96 -1.76 -1.76 -1.76 c
  -2.56 -1.76 -3.04 -1.28 -3.2 -0.96 c
  -3.2 -0.96 l
  -3.2 -3.36 l
  -4 -3.36 l
  -4 3.36 l
  -3.2 3.36 l
  -3.2 0.64 l
  -3.2 -0.64 -2.56 -0.96 -1.92 -0.96 c
  -1.12 -0.96 -0.8 -0.64 -0.8 0.16 c
  -0.8 3.36 l
  0 3.36 l
  0 0 l
  h
  f
Q

.
.
.

Each block of "q" ended by "Q" seems to be a small image (character in the case of my document).

This is how it looks visually in Adobe Acrobat: Screenshot taken from Adobe Acrobat

I need to determine the bounding boxes values (dimensions such as X-Y coordinates and width and height), like if they were just one object. Like below: Bounding Box representation from Adobe Acrobat

As mentioned above I determined that each "character" is a block of "q and Q" operators in the PDF Content.

I wonder if we can get those dimensions (of the big bounding box) using JAVA and PDFBOX just like Adobe Acrobat is able to do it.

Alf
  • 11
  • 3
  • https://stackoverflow.com/questions/38931422/ might help a bit, although it's not a duplicate. But you could try to catch these shapes to get the bounding box. – Tilman Hausherr Jul 15 '22 at 09:52
  • Hi Tilman, Thanks for the suggestion, I tried to use the approach mentioned in the other forum. For a moment I thought it was working because it was printing the instructions of each movement as it was iterating, but at the end it didn't print any Rectangle2D value, I was hoping that at some point reaches this point, but it didn't: `public void strokePath() throws IOException { // do stuff System.out.println(linePath.getBounds2D()); linePath.reset(); }` Is there any suggestion or something that I might missing? – Alf Jul 15 '22 at 18:34
  • fillPath() might be better – Tilman Hausherr Jul 16 '22 at 13:18
  • Thanks @TilmanHausherr, the "fillPath()" did the job. – Alf Jul 22 '22 at 17:38

1 Answers1

1

Following the same approach that is posted here:

pdfbox 2.0.2 > Calling of PageDrawer.processPage method caught exceptions

They mentioned that the logic should be placed on the "strokePath()" method, but for my case as mentioned by @TilmanHausherr, I used the "fillPath()" to write my logic there.

Be aware that the class you define should be extend from PDFGraphicsStreamEngine.

Alf
  • 11
  • 3