12

SOLUTION:

I've had to train my own data to try it with the OCR. It seems that works well, but I don't know why the trained data from arturaugusto not works for me =(

https://github.com/adri1992/Tesseract_sevenSegmentsLetsGoDigital.git

With my trained data, to get good results of the OCR, I've done this phases (I've done it with OpenCV):

  • First, convert the image to Black&White
  • Second, apply to the image a Gaussian Blur
  • Third, apply to the image a Threshold filter

With this, the seven segments digits are recognized.

QUESTION:

I'm trying to get an OCR through Tesseract on Android, and I'm testing the app with this image (via Text detection on Seven Segment Display via Tesseract OCR):

OCR test image

I'm using the data trained by arturaugusto (https://github.com/arturaugusto/display_ocr), but the wrong result of the OCR is:

884288

The zero is recognized as an eight, and I don't know why.

I'm applying to the image a Gaussian Blur and a threshold filter, via OpenCV, and the image processed is this:

OCR Image processed

Is there any other data trained or do you know any way to solve the problem?

Community
  • 1
  • 1
adlagar
  • 877
  • 10
  • 31
  • Hey adri, any updates in your solution? :-) – Felipe Ferri Nov 25 '15 at 02:10
  • Hi Felipe! I've trained my own data... Try it https://github.com/adri1992/Tesseract_sevenSegmentsLetsGoDigital and check me if it works for you. Remember to do all phases that I comment in the "solution" section of the post – adlagar Nov 25 '15 at 09:19
  • I managed to process your test image using python pillow and reaching a bw image similar to yours, but when I run tesseract with your trained data it returns an empty page (!). I'm not sure if I installed the trained data correctly... I copied everything to the folder /opt/local/share/tessdata (I'm on Mac OS X). When I run tesseract --list-langs the "lets" language is shown. Do you have any tips? By the way, your training data stopped mistaking "0" for "8" (as you stated in your question)? – Felipe Ferri Nov 25 '15 at 19:14
  • Thanks adri1992 for your trained data. – Aung Myat Hein May 19 '16 at 06:29
  • hello, @adri1992 were you able to do it? I am stuck at the final stage from last 2 days – Zeeshan Shabbir Feb 11 '19 at 17:52
  • 2
    Hi Zeeshan! I trained my own data. It should be working with that concrete font https://github.com/adri1992/Tesseract_sevenSegmentsLetsGoDigital – adlagar Feb 11 '19 at 19:50
  • @adri1992 I using your trained data and blurred image in tesseractapi, the result perfect.when i tried to blur by followed your steps,not perfect. there might be different perimeters. could you share the piece of code of three steps. More helpful .Thank You. – Manikandan Aug 20 '19 at 14:52
  • @Felipe Ferri : I have the same issu, but I am on Windows. Did you get able to make it work ? – LCMa Apr 27 '21 at 12:42

1 Answers1

0

Try using erode to fill the gaps between the segments. I think the problem is that tesseract can't handle well segmented font.

With OpenCV-python, I use cv2.erode(display,kernel, iterations = erosion_iters) to solve this problem.

art
  • 181
  • 1
  • 9
  • Yes, I've tried to fill the gaps between the segments, but it not works for me either :( I have trained my own data with the same font, and now, I don't know exactly why, the OCR works well with this new trained data. In some minutes I'll update the question with the solution and the repository direction. Thanks so much! – adlagar Jun 05 '15 at 08:20
  • Can you clarify what you are declaring `display` and `kernel` as earlier in the code? (eg is it an import of some kind?) – takanuva15 Jul 11 '21 at 15:59
  • 1
    @takanuva15, for take this example: import cv2; import numpy as np; display = cv2.imread('display.png',0); kernel = np.ones((6,6),np.uint8); eroded_img = cv2.erode(display, kernel, iterations = 1); – art Jul 15 '21 at 13:20