Since your image is only black/white, you can do simple thresholding and morphological transformations to filter the image. If your image input was not black and white, you could do blurring techniques such as cv2.medianBlur()
or cv2.GaussianBlur()
to smooth the image as a preprocessing step. Then you could perform morphological operations with various kernel sizes or construct custom kernels with cv2.getStructuringElement()
. Generally, a larger kernel size (7x7
or 9x9
) will remove more noise but also remove the desired details as opposed to a smaller kernel (3x3
or 5x5
). There is a trade off depending on how much noise you want to remove while balancing the amount of details to preserve. Take a look at this answer for colored captchas.
Threshold

Morph close

Invert image for Tesseract

Result
-63 164
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = cv2.imread('1.png',0)
thresh = cv2.threshold(image, 150, 255, cv2.THRESH_BINARY_INV)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel)
result = 255 - opening
cv2.imshow('thresh', thresh)
cv2.imshow('opening', opening)
cv2.imshow('result', result)
print(pytesseract.image_to_string(result))
cv2.waitKey()