4

As in this post: pytesseract using tesseract 4.0 numbers only not working Described, its possible to detect numbers with the eng.traineddata file but if I want to detect only numbers, this isn't possible with this file. Even if you define tessedit_char_whitelist=0123456789 it doesn't recognize anything.

  1. I searched on GitHub and so on to find a digit.traineddata for Tesseract 4.0 but didn't found one? Does someone know which one I could take?
  2. Is it possible to use one from Tesseract 3.x (but also found nothing there)
  3. Is it complicated to train my own dataset only with numbers, what would be the way to do this?
Dorian Gaensslen
  • 173
  • 2
  • 10
  • 2
    As mentioned [here](https://stackoverflow.com/a/48210809/4766168) - you can download it from https://github.com/Shreeshrii/tessdata_shreetest – Dmitrii Z. Nov 30 '18 at 19:43
  • awesome thanks, it works like a charm as far as i can tell. Does character whitelisting work on this one? – Dorian Gaensslen Dec 06 '18 at 10:07
  • No, both blacklisting and whitelisting won't work with LSTM version. Your only option is to train lstm model with characters you need (digits in the case mentioned in the link). If you want white-black list feature - you need to disable LSTM. – Dmitrii Z. Dec 06 '18 at 10:55
  • 1
    Whitelisting now works with version 4.1. – Donald Rich May 11 '20 at 08:20

0 Answers0