Tesseract cant detect "1"

Question

I have images like one added. img

But, Tesseract using Python have 0% accuracy (literaly). I use only 5 numbers and quality is really low. Only detecking "2" works good.

Is it about font? Someting other could be improved?

Command: d = pytesseract.image_to_string(biggest, config="--psm 10 -c tessedit_char_whitelist=12345")

Some things you could try: 1) threshold the image before passing it to tesseract - the range of brightnesses on the same character might be confusing it. 2) The image is rotated - do you have a way to undo the rotation before passing it to tesseract? — Nick ODell, Nov 09 '22 at 19:49
Rotating is random (simillar angle but random). I used threshold too and it didnt help. Maybe its not worst, but still no better, — Patryk Organiściak, Nov 09 '22 at 21:28
Can you make an imgur album with ~100 examples? Maybe this can be solved from a machine learning perspective even if classical computer vision doesn't help. — Nick ODell, Nov 09 '22 at 22:02
Those with black background are "before" threshold. I am using detection on those with white background. 10% are detected correctly (so I have partly label;ed data) . There is many duplitaction, because I am saving a few images in one time and using voting method for better quality. https://imgur.com/a/U3ZR0Im — Patryk Organiściak, Nov 11 '22 at 15:55
Can you clarify what characters [this dash](https://i.imgur.com/DFNTDOA.png) and [this](https://i.imgur.com/ILuOTLZ.png) should be read as? They don't appear in the whitelist you're using for tesseract. — Nick ODell, Nov 11 '22 at 16:36
What version of tesseract are you using? You can check with `pytesseract.get_tesseract_version()`. Some stuff I'm reading online says that tesseract version 4.0.0 doesn't respect the tessedit_char_whitelist setting if LSTM is in use (the default.) https://stackoverflow.com/a/49030935/530160 — Nick ODell, Nov 11 '22 at 16:52
Thats "noise". It happends sometime, thats why I am using voting. Sorry I forgot to mention abouit it. Version: 5.2.0.20220712 Standard ML should give me better resoults? — Patryk Organiściak, Nov 11 '22 at 18:16

Tesseract cant detect "1"

0 Answers0