4

When i extract text using itextsharp, i will get x and y coordinate of text. By using these 2 coordinates if i convert text from pdf to html based on x y position the text position chnages . to get x ,y coordinates i used

  • Vector curBaseline = renderInfo.GetBaseline().GetStartPoint();

  • float x=curBaseline[Vector.I1];

  • float y= curBaseline[Vector.I2];

    for example : when i extract text using above method say x=42 and y=659;

    " < span style=left:{0}px;bottom:{1}px;position:relative;\">",curBaseline[Vector.I1],curBaseline[Vector.I2]); the position changes . can you help me how to set position of text default as pdf.?????

pdp
  • 609
  • 9
  • 22
  • If i recall correctly, PDF uses a coordinate system which starts in the left corner at the BOTTOM of the page, not at the Top. So every coordiante is wrong, when you use it directly in html. You will have to convert the values. – Christian Sauer Mar 20 '13 at 10:49
  • 1
    yes you r right. how to convert values ? thank you – pdp Mar 20 '13 at 11:13
  • Find the height of the document and subtract the `y` value from it. Also, either use the top of the text instead of the baseline or just account for the font's size. – Chris Haas Mar 20 '13 at 13:38
  • i got height by subtracting the y as you said i tried this. it was helpful. height = reader_FirstPdf.GetPageSizeWithRotation(i).Height; ----but if pdf consists of Kd then the text will take superscript as subscript and subscript as superscript.how to solve this problem ? thankyou – pdp Mar 21 '13 at 12:56
  • how to extract mulyiple copies of pages. they are overlapped and look messy.? – pdp Mar 22 '13 at 05:51

1 Answers1

5

Posted as answer...

If i recall correctly, PDF uses a coordinate system which starts in the left corner at the BOTTOM of the page, not at the Top. So every coordiante is wrong, when you use it directly in html. You will have to convert the values.

Your pdf document should have something like document.actualheight, simply subtract your value from that....

Christian Sauer
  • 10,351
  • 10
  • 53
  • 85
  • 1
    Actually the PDF generating software can have (0,0) anywhere, either on-page or off-page. Furthermore the coordinates are given in user space units which by default are 1/72 inches but can be configured to be different. That being said, most PDFs have (0,0)at the bottom left of the page and use the default units. – mkl Mar 20 '13 at 14:11