I am being trying to OCR an ID image from tesseract cause I am new to this field I don't know much about image preprocessing. But soo far I have done this but not getting good output.
this is the original image
this is the code i tried so far.
img = cv.imread('id/ID (6).jpg',0)
smooth = cv.GaussianBlur(img, None, sigmaX=30, sigmaY=30)
division = cv.divide(img, smooth, scale=255)
ret,thresh3 = cv.threshold(division,220,150,cv.THRESH_TRUNC)
adaptive = cv.adaptiveThreshold(thresh3, 255, cv.ADAPTIVE_THRESH_GAUSSIAN_C, cv.THRESH_BINARY, 11,2 )
kernel = cv.getStructuringElement(cv.MORPH_RECT, (3,3))
morpho_e = cv.morphologyEx(adaptive,cv.MORPH_ERODE,kernel,iterations=1)
this is the output image I'm getting
For tesseract-OCR
py.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
config_param = r'--oem 1 -l ell --psm 6+11'
string = py.image_to_string(morpho_e,config=config_param)
print(string)
OUTPUT (text im getting):
ΣΤΟΙΧΕΙΑ ΤΑΥΤΟΤΗΤΑΣ Ξ
ΑΙΘΟΞΟΠΟΥΛΩΣ ο.
[θηδίθροικος αδΛ
ἀδίόοςν ο φδ
ΤΗΡΟΡΟΒΟΣ
ΩΟΙνΕΝ ΝΗΣ ]
ΙΟΑΝΗΜΙς
-ςτ-τ--.
ἈθδΟΗ Λο α΄
ὀπο ον
ΜΑΡΙΑ
ΜΑΤΘ ΜΗΤΕΗΝ γ
ΘΕΣΣΑΛΟΝΙΚΗ ΘΕ ΠΑΤΕ ΟΝϊΚΗΣ δἳ
τοΠοςΕΓΕΕΗΗΗΣΗς..ὸ.ΎΥνς-
τὸ ΝΡῑ
ΘΕΣΣΑΛΟΝΙΚΗΣ 177992/9 |
ΔΑ. ΤΟΥΜΠΑΣ -ΤΡΙΑΝΔΡΙΑΣ ''
- " ων Ξη.. , : ν. :
ΔΑΝΑΗ ή αυρούλα
ἠ Οτο ΥπΟΞΣ ΑΟἳ ϱ Δεῖγὲ ! )
δ” “'.... ν΄
ψἂ (ΥΠΟΓΡΑΦΗ - ΣΦΡΑΓΙΔΑ)
αν... ο, ) Ρς ο πς, εν
kindly some one help me or give some guide to tackle this problem