21

Is there any way to get Tesseract to match only user-specified words or patterns? The manual claims it is possible, yet I cannot find a single documented instance on the internet of somebody getting this working.

Here are many examples of people asking for help because it does not work, and none have a proven resolution.

stackoverflow.com/questions/33429143/tesseract-user-pattern-is-not-applied

stackoverflow.com/questions/31874393/tesseract-ocr-force-pattern

stackoverflow.com/questions/26856349/provide-pattern-for-tesseract

stackoverflow.com/questions/22432194/tesseract-ocr-only-detect-user-words

stackoverflow.com/questions/17209919/tesseract-user-patterns

groups.google.com/forum/#!topic/tesseract-ocr/S9CIK3jOMWw

groups.google.com/forum/#!topic/tesseract-ocr/5vFqVcJmHnM

So can we conclude that this feature simply does not work? Is there an official statement to this effect?

Community
  • 1
  • 1
Michael Connor
  • 473
  • 1
  • 4
  • 9
  • 2
    A lot of the linked Tesseract documents appear to have moved. [Here](https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc) is a link to a manual on github. – Evan Jun 15 '16 at 16:38
  • 2
    Year later, still appears to be the case. – Slight Feb 28 '17 at 16:17
  • The link to the manual is dead – Adelin Dec 19 '17 at 12:47
  • 1
    Repo admins say that user-patterns broke somewhere around v3.02. LSTM v4.0 probably has broken user-patterns as well as char-whitelisting https://github.com/tesseract-ocr/tesseract/issues/960 – NightFury13 Apr 19 '18 at 09:47

1 Answers1

7

There is now an example on the Tesseract doc site at https://tesseract-ocr.github.io/tessdoc/APIExample-user_patterns.html [Thanks @Ravi for the new link]

That test example does work for me in the oem=1 / LSTM mode of Tesseract 4.x.

I can't, however, get it to work for any other examples, or in any other modes.

I have seen no official statement and at the time of writing it does indeed seem that the feature is non-functional.

jtlz2
  • 7,700
  • 9
  • 64
  • 114