Recognize carattere with opencv and tesseract (java)

Question

I'm trying to detect text in the picture

using code:

MatOfByte mob=new MatOfByte();

Imgcodecs.imencode(".png", src, mob);

byte bb[]=mob.toArray();

BufferedImage bi=ImageIO.read(new ByteArrayInputStream(bb));

String text = tesseract.doOCR(bi);

But Tesseract find 6,52. It seems a clean image and I don't understand how ocr can fail.

I'm using eng language: tesseract.setLanguage("eng");

opencv ver 4.51

tess4j-3.4.8

What's wrong in the image?

score 0 · Answer 1 · answered Jan 24 '21 at 22:26

I have a two step solution

1. Apply adaptive-threshold
1. Set psm mode to 6

When you apply adaptive-threshold to the image:

Now when you read the image:

€1,52

Code:

import cv2
from pytesseract import image_to_string

img = cv2.imread("s6lVY.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 21, 21)
txt = image_to_string(thr, config="--psm 6")
print(txt)

If you can't find the desired solution using pytesseract, you need to apply image-processing.

Recognize carattere with opencv and tesseract (java)

1 Answers1