2

I am using a console app and very basic Tesseract to perform digit recognition. I have copied an image from google and tried to find the digits only.

Bitmap image = new Bitmap("1.png");
TesseractEngine t = new TesseractEngine("./tessdata", "eng", EngineMode.Default);
t.SetVariable("tessedit_char_whitelist", "01234567890");
var r = t.Process(image, PageSegMode.SingleBlock);
Console.WriteLine("Result: " + r.GetText());
Console.ReadLine();

The image is Image from google The result are different depending on the pageSegMode, but none of them are close to the image. What is the best way to use Tesseract to identify digits from such pics?

ARH
  • 1,566
  • 3
  • 25
  • 56

1 Answers1

0

Tesseract won't work well with an image like that unless you somehow train it specifically for that case, but I don't think you have to if you can transform the image right

Your goal should be feeding it a black and white picture with black digits and white background, you should do this processing before OCRing the image, there are many libraries for this, most people use OpenCV.

Tesseract already does some image processing but it's not great and probably doesn't help much with an image like that. You can view the auto processed result with tessedit_write_images to see what's actually being OCRed.

Here's some useful links:

https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#image-processing

Using tesseract to recognize license plates

victormeriqui
  • 151
  • 2
  • 10