3

Here's the line from Tesseract 4 output (.hocr file):

<span class='ocr_line' 
      id='line_1_1' 
      title="bbox 36 92 580 122; baseline 0 -6; x_size 30; x_descenders 6; x_ascenders 6">

What's the meaning of x_descenders and x_ascenders properties?

I know in typography a descender is "the portion of a letter that extends below the baseline of a font". But there are situations when x_descenders is not an integer but a float like in x_descenders 5.2608695. Then what that would mean?

Then I wonder how to interpret decimal parts of

dzieciou
  • 4,049
  • 8
  • 41
  • 85
  • 1
    Sorry, my bad. I thought ```x_ascenders``` is the integer variable ```asscount``` from the forloop. It appears to be the float variable ```ascenders```. – Grada Gukovic Dec 11 '19 at 08:27
  • @GradaGukovic No problem. I will need to dig through https://github.com/tesseract-ocr/tesseract/search?p=2&q=descenders&unscoped_q=descenders. Still I wonder how this information is used later to understand and extract information from .hocr files. I guess it's not. – dzieciou Dec 11 '19 at 08:30
  • 1
    Hello, did you find any useful information or if they are of any use? – SajanGohil Nov 13 '20 at 11:31

0 Answers0