0

I'm trying to detect text in the picture image

using code:

MatOfByte mob=new MatOfByte();

Imgcodecs.imencode(".png", src, mob);

byte bb[]=mob.toArray();

BufferedImage bi=ImageIO.read(new ByteArrayInputStream(bb));

String text = tesseract.doOCR(bi);

But Tesseract find 6,52. It seems a clean image and I don't understand how ocr can fail.

I'm using eng language: tesseract.setLanguage("eng");

opencv ver 4.51

tess4j-3.4.8

What's wrong in the image?

Bilal
  • 3,191
  • 4
  • 21
  • 49

1 Answers1

0

I have a two step solution



When you apply adaptive-threshold to the image:

enter image description here

Now when you read the image:

€1,52

Code:


import cv2
from pytesseract import image_to_string

img = cv2.imread("s6lVY.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 21, 21)
txt = image_to_string(thr, config="--psm 6")
print(txt)

If you can't find the desired solution using pytesseract, you need to apply image-processing.

Ahmet
  • 7,527
  • 3
  • 23
  • 47