4

Hi Im a newer to Pdfbox and I want to highlight certain character of PDF files. Right now I can get the coordinates of the character and I want to highlight it.

I saw on this link: highlight text using pdfbox when it's location in the pdf is known

that shows the steps to highlight.

My question lies on these 2 steps: markup.setRectangle(); markup.setQuads();

I've tried to understand Quadpoints and PDRectangles but failed.

Actually if I wrote code like this:

    PDRectangle position = new PDRectangle(50,50);
    markup.setRectangle(position);
    float []p=new float[8];
    p[0]=100;p[1]=100;p[2]=200;p[3]=100;p[4]=100;p[5]=500;p[6]=200;p[7]=500;
    markup.setQuadPoints(p);

I would get nothing, but if I set LowerLeftX,LowerLeftY,UpperRightX,UpperRightY for PDRectanlge I could get the highlited text,but the coordinates are not what I expected.

Could any one explain to me the difference between these two classes? Since I already have 4 points in Quadpoints, why do I still have to add the position of the Rectangle?What realations do these 2 classes have?

Thanks!

Community
  • 1
  • 1
Yi Zhu
  • 41
  • 3

1 Answers1

2
List<PDAnnotation> annotations =document.getPage(pageNumber-1).getAnnotations();
PDAnnotationTextMarkup markup = new PDAnnotationTextMarkup(PDAnnotationTextMarkup.SUB_TYPE_HIGHLIGHT);
//set the page type can be A4,Letter,etc.
markup.setRectangle(PDRectangle.LETTER);
//set the 4 co-ordinates to quadpoints in the order (left,top,right,top,left,bottom,right,bottom)
quads[0] =  quadValues[0];
quads[1] =  pageLength-quadValues[1]+2.0f;
quads[2] =  quadValues[2];
quads[3] =  pageLength-quadValues[3]+2.0f;
quads[4] =  quadValues[4];
quads[5] =  pageLength-quadValues[5];
quads[6] =  quadValues[6];
quads[7] =  pageLength-quadValues[7];
markup.setQuadPoints(quads);
mkl
  • 90,588
  • 15
  • 125
  • 265
  • Setting the rectangle to a page size is usually an overkill (making it hard in PDF viewers to select the annotation) and sometimes even wrong (it implicitly assumes the origin 0,0 to be in the bottom left of the page which is not always the case). More appropriate in production would be the bounding box of all quad points plus a bit. – mkl Jul 16 '18 at 15:56