The image you have posted is very challenging.
The solution I am posting is too specific for the image you have posted.
I tried to keep it as general as I could, but I don't expect it to work very well on other images.
You may use it for getting ideas for more options for removing noise.
The solution is mainly based on finding connected components and removing the smaller components - considered to be noise.
I used pytesseract
OCR for checking if the result is clean enough for OCR.
Here is the code (please read the comments):
import numpy as np
import scipy.signal
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe" # For Windows OS
# Read input image
input = cv2.imread("n4.jpg")
# Convert to Grayscale.
gray = cv2.cvtColor(input, cv2.COLOR_BGR2GRAY)
# Convert to binary and invert polarity
ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
# Find connected components (clusters)
nlabel, labels, stats, centroids = cv2.connectedComponentsWithStats(thresh, connectivity=8)
# Remove small clusters: With both width<=10 and height<=10 (clean small size noise).
for i in range(nlabel):
if (stats[i, cv2.CC_STAT_WIDTH] <= 10) and (stats[i, cv2.CC_STAT_HEIGHT] <= 10):
thresh[labels == i] = 0
#Use closing with very large horizontal kernel
mask = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, np.ones((1, 150)))
# Find connected components (clusters) on mask
nlabel, labels, stats, centroids = cv2.connectedComponentsWithStats(mask, connectivity=8)
# Find label with maximum area
# https://stackoverflow.com/questions/47520487/how-to-use-python-opencv-to-find-largest-connected-component-in-a-single-channel
largest_label = 1 + np.argmax(stats[1:, cv2.CC_STAT_AREA])
# Set to zero all clusters that are not the largest cluster.
thresh[labels != largest_label] = 0
# Use closing with horizontal kernel of 15 (connecting components of digits)
mask = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, np.ones((1, 15)))
# Find connected components (clusters) on mask again
nlabel, labels, stats, centroids = cv2.connectedComponentsWithStats(mask, connectivity=8)
# Remove small clusters: With both width<=30 and height<=30
for i in range(nlabel):
if (stats[i, cv2.CC_STAT_WIDTH] <= 30) and (stats[i, cv2.CC_STAT_HEIGHT] <= 30):
thresh[labels == i] = 0
# Use closing with horizontal kernel of 15, this time on thresh
thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, np.ones((1, 15)))
# Use median filter with 3x5 mask (using OpenCV medianBlur with k=5 is removes important details).
thresh = scipy.signal.medfilt(thresh, (3,5))
# Inverse polarity
thresh = 255 - thresh
# Apply OCR
data = pytesseract.image_to_string(thresh, config="-c tessedit"
"_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890-/"
" --psm 6"
" ")
print(data)
# Show image for testing
cv2.imshow('thresh', thresh)
cv2.waitKey(0)
cv2.destroyAllWindows()
thresh
(clean image):

OCR result: EXPO22016/01-2019