How to find x,y location of a text in pdf

Question

Is there any tool to find the X-Y location on a text content in a pdf file ?

Vitaliy Shibaev · Answer 1 · 2021-03-15T12:50:31.350

4

Docotic.Pdf Library can do it. See C# sample below:

using (PdfDocument doc = new PdfDocument("your_pdf.pdf"))
{
    foreach (PdfTextData textData in doc.Pages[0].Canvas.GetTextData())
        Console.WriteLine(textData.Position + " " + textData.Text);
}

edited Mar 15 '21 at 12:50

answered Jan 20 '11 at 16:46

Vitaliy Shibaev

1,420
10
24

score 1 · Answer 2 · answered Jan 19 '11 at 20:32

1

Try running "Preflight..." in Acrobat and choosing PDF Analysis -> List page objects, grouped by type of object.

If you locate the text objects within the results list, you will notice there is a position value (in points) within the Text Properties -> * Font section.

answered Jan 19 '11 at 20:32

Orbling

20,413
3
53
64

is it possible to find the x,y position and hight, width of each word ? – raki Jan 19 '11 at 21:28
@raki: Where the position is, size is right below, but that is only for a text block, which can be any arbitrary text. To get individual word sizes would require calculation of the font metrics. What is the purpose of what you are doing, there may be a better approach. – Orbling Jan 19 '11 at 22:04

score 1 · Answer 3 · answered Jan 23 '11 at 02:16

TET, the Text Extraction Toolkit from the pdflib family of products can do that. TET has a commandline interface, and it's the most powerful of all text extraction tools I'm aware of. (It can even handle ligatures...)

Geometry
TET provides precise metrics for the text, such as the position on the page, glyph widths, and text direction. Specific areas on the page can be excluded or included in the text extraction, e.g. to ignore headers and footers or margins.

How to find x,y location of a text in pdf

3 Answers3

Linked