46

I would like to use tesseract for serial number recognition, where I only want to recognize single characters, no word, no dictionary. Therefore I would like to use one of the already trained tesseract font-types for the serial number to achieve better recognition results.

These are the trained Tesseract font-types:

Andale_Mono.ttf
Arial_Black.ttf
Arial_Bold.ttf
Arial.ttf
Comic_Sans_MS_Bold.ttf
Comic_Sans_MS.ttf
Courier_New_Bold.ttf
Courier_New.ttf
Georgia_Bold.ttf
Georgia.ttf
Gottf
Impact.ttf
Times_New_Roman_Bold.ttf
Times_New_Roman.ttf
Trebuchet_MS_Bold.ttf
Trebuchet_MS.ttf
Verdana_Bold.ttf
Verdana.ttf

Since the trained font-types also have different font-design styles, there are problems in distinguishing, for example, the "Z" and "2" characters. Times New Roman has a more rounded design, while Arial has only more straight lines.

Font-type design differences

My experience is, that tesseract has problems distinguishing the "Z" and the "2" due to the changed similarity of the other font-designs.

Therefore I think I can achieve better recognition results if only one font-type (for example Arial) is used for character recognition with tesseract.

Question:

Is there a possibility to specify the font-type in tesseract?

Similar, but older topic (October 2012) Link

David Buck
  • 3,752
  • 35
  • 31
  • 35
Mr.Sheep
  • 1,368
  • 1
  • 15
  • 32
  • 3
    Possible duplicate of [Explicitly set the font to be used for recognition by Tesseract-OCR](https://stackoverflow.com/questions/13154150/explicitly-set-the-font-to-be-used-for-recognition-by-tesseract-ocr) – jtlz2 Sep 06 '19 at 09:17

2 Answers2

1

Until now this option is not available. The current version is Tesseract 5.

Esraa Abdelmaksoud
  • 1,307
  • 12
  • 25
1

No, but you can try training your own model with only the font(s) you want. You could also try to fine tune their existing eng model.

See these resources for more:

https://github.com/tesseract-ocr/tesstrain

https://tesseract-ocr.github.io/tessdoc/tess5/TrainingTesseract-5.html

A fair warning: It's a bit of an involved process, and will probably take you a while.

  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/late-answers/32704705) – user16217248 Sep 18 '22 at 04:30
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Sep 18 '22 at 16:02
  • Based on how much setup is required, it is unlikely that you can bring essential "useful" parts, without butchering the process to the point anyone would still need to go to the page. Given the situation, it looks fair to me to answer; "no; you cannot; but you can train it. Should you want to train it, refer to the tesseract docs." – ferreiradev Nov 21 '22 at 17:10