Pytesseract does not detect me numbers

Question

I am making a simple program to detect numbers in an image with python and pytesseract, but the case is that it always returns me ♀, I am analyzing an image like this:

my image

and my code to read the numbers is the following:

import pytesseract
from pytesseract import (
    Output,
    TesseractError,
    TesseractNotFoundError,
    TSVNotSupported,
    get_tesseract_version,
    image_to_boxes,
    image_to_data,
    image_to_osd,
    image_to_pdf_or_hocr,
    image_to_string,
    run_and_get_output
)

def analizar_resultado(path): 
    image = cv2.imread(path, 1)
    
    text = pytesseract.image_to_string(image, config = 'digits')
    print('texto detectado:', text)

but I can't make it work for me, I have tried more images of this type with better quality and others, but I can't get any number back, how could I solve this? Thanks a lot

Want to improve Tesseract text recognition? Goggle for _tesseract improve recognition_ — DisappointedByUnaccountableMod, Jan 24 '21 at 23:51
I want to detect only digits, but what do you mean with Google for tesseract?? Thanks — neural_krobus, Jan 24 '21 at 23:57
and do you now something more that i can try? other ocr or something like that? Thanks — neural_krobus, Jan 25 '21 at 07:26

score 1 · Accepted Answer · answered Jan 27 '21 at 03:50

I have a three-step solution

1. Get each digit separately
1. Apply threshold
1. Read the output

Part-1: Get each digit separately

You can get each digit by using index variables. For instance:
- ```
s_idx = 0  # start index
e_idx = int(w/5) - 10  # end index
```

First get height and width of the image then for each digit, increase the indexes

for _ in range(0, 6):
    gry_crp = gry[0:h, s_idx:e_idx]
    s_idx = e_idx
    e_idx = s_idx + int(w/5) - 20

Result
- 0 0 9 9 7 6
Part-2: Apply threshold
- 0 0 9 9 7 6
Part-3: Read
- ```
0.9976
```

Unfortunately the second-zero can't be recognized as digit due to artifacts.

If you can't read image, try with different psm configurations

Code:

import cv2
from pytesseract import image_to_string

img = cv2.imread("A3QRw.png")
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
(h, w) = gry.shape[:2]
s_idx = 0  # start index
e_idx = int(w/5) - 10  # end index

result = []

for i, _ in enumerate(range(0, 6)):
    gry_crp = gry[0:h, s_idx:e_idx]
    (h_crp, w_crp) = gry_crp.shape[:2]
    gry_crp = cv2.resize(gry_crp, (w_crp*3, h_crp*3))
    thr = cv2.threshold(gry_crp, 0, 255,
                        cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
    txt = image_to_string(thr, config="--psm 6 digits")
    result.append(txt[0])
    s_idx = e_idx
    e_idx = s_idx + int(w/5) - 20
    cv2.imshow("thr", thr)
    cv2.waitKey(0)

print("".join([digit for digit in result]))

There are 6 images, so I thought dividing the `width` into 5 separate labels will give me 6 images. — Ahmet, Jan 27 '21 at 09:31
Resizing the image is beneficial for getting an accurate result. `3` is just a number for getting the desired result. You can use any other positive number — Ahmet, Jan 27 '21 at 09:49
and if i have an image with low resolution, how can i use the threshold and see better the img?? — Héctor, Jan 30 '21 at 22:05

Pytesseract does not detect me numbers

1 Answers1