Digital Numbers on Tesseract OCR

Question

SOLUTION:

I've had to train my own data to try it with the OCR. It seems that works well, but I don't know why the trained data from arturaugusto not works for me =(

https://github.com/adri1992/Tesseract_sevenSegmentsLetsGoDigital.git

With my trained data, to get good results of the OCR, I've done this phases (I've done it with OpenCV):

First, convert the image to Black&White
Second, apply to the image a Gaussian Blur
Third, apply to the image a Threshold filter

With this, the seven segments digits are recognized.

QUESTION:

I'm trying to get an OCR through Tesseract on Android, and I'm testing the app with this image (via Text detection on Seven Segment Display via Tesseract OCR):

OCR test image

I'm using the data trained by arturaugusto (https://github.com/arturaugusto/display_ocr), but the wrong result of the OCR is:

884288

The zero is recognized as an eight, and I don't know why.

I'm applying to the image a Gaussian Blur and a threshold filter, via OpenCV, and the image processed is this:

OCR Image processed

Is there any other data trained or do you know any way to solve the problem?

Hi Felipe! I've trained my own data... Try it https://github.com/adri1992/Tesseract_sevenSegmentsLetsGoDigital and check me if it works for you. Remember to do all phases that I comment in the "solution" section of the post — adlagar, Nov 25 '15 at 09:19
I managed to process your test image using python pillow and reaching a bw image similar to yours, but when I run tesseract with your trained data it returns an empty page (!). I'm not sure if I installed the trained data correctly... I copied everything to the folder /opt/local/share/tessdata (I'm on Mac OS X). When I run tesseract --list-langs the "lets" language is shown. Do you have any tips? By the way, your training data stopped mistaking "0" for "8" (as you stated in your question)? — Felipe Ferri, Nov 25 '15 at 19:14
hello, @adri1992 were you able to do it? I am stuck at the final stage from last 2 days — Zeeshan Shabbir, Feb 11 '19 at 17:52
Hi Zeeshan! I trained my own data. It should be working with that concrete font https://github.com/adri1992/Tesseract_sevenSegmentsLetsGoDigital — adlagar, Feb 11 '19 at 19:50
@adri1992 I using your trained data and blurred image in tesseractapi, the result perfect.when i tried to blur by followed your steps,not perfect. there might be different perimeters. could you share the piece of code of three steps. More helpful .Thank You. — Manikandan, Aug 20 '19 at 14:52
@Felipe Ferri : I have the same issu, but I am on Windows. Did you get able to make it work ? — LCMa, Apr 27 '21 at 12:42

score 0 · Answer 1 · answered Jun 02 '15 at 19:29

0

Try using erode to fill the gaps between the segments. I think the problem is that tesseract can't handle well segmented font.

With OpenCV-python, I use cv2.erode(display,kernel, iterations = erosion_iters) to solve this problem.

answered Jun 02 '15 at 19:29

art

181
1
9

Yes, I've tried to fill the gaps between the segments, but it not works for me either :( I have trained my own data with the same font, and now, I don't know exactly why, the OCR works well with this new trained data. In some minutes I'll update the question with the solution and the repository direction. Thanks so much! – adlagar Jun 05 '15 at 08:20
Can you clarify what you are declaring `display` and `kernel` as earlier in the code? (eg is it an import of some kind?) – takanuva15 Jul 11 '21 at 15:59
1

@takanuva15, for take this example: import cv2; import numpy as np; display = cv2.imread('display.png',0); kernel = np.ones((6,6),np.uint8); eroded_img = cv2.erode(display, kernel, iterations = 1); – art Jul 15 '21 at 13:20

Digital Numbers on Tesseract OCR

1 Answers1