2

I want to automate the task of entering the faxed documents into the system with OCR. I tried using tesseract, but I found that the part that does not contain the watermark is recognized well, but the part that is watermarked is almost impossible to recognize.

I would like to remove the watermark using image processing. Unusually, the watermark of this image consists of 1x1 black pixels.

Is there a way to get rid of this watermark?

Example document (masked sensitive personal info):

건강보험자격득실확인서_마스킹처리

Watermark (composed of 1x1 pixels):

워터마크예시


Edit: Another answer marked as duplicate is simply a greyed-out watermark, but the image I want to process is a binary image, so the color of the image and watermark is black. So it can not be processed in the same way and it seems to have to be processed in a different way.

youngminz
  • 1,364
  • 2
  • 14
  • 23
  • Possible duplicate of [Removing watermark out of an image using opencv](https://stackoverflow.com/questions/32125281/removing-watermark-out-of-an-image-using-opencv) – zindarod Jun 01 '19 at 06:51
  • You may look out for dilation operation with a cross kernel shape. – ZdaR Jun 01 '19 at 07:29

2 Answers2

2

You can use the morphological Closing operation.

Use the closing only on the ROI with the watermark.

Here is a MATLAB code sample:

I = rgb2gray(imread('kmyxE.png')); %Read the image and convert it to Grayscale.
J = I;

%Morphological closing with kernel size 3x3 (applyied only the area with the watermark).
J(720:1450, 480:1260) = imclose(I(720:1450, 480:1260), ones(3));

I leave you the pleasure to implement it using OpenCV...

Result:
enter image description here


The following solution may be better:

  1. Close in X direction using kernel 1x3
  2. Close in Y direction using kernel 3x1
  3. Take the minimum value of the the two images.
I = rgb2gray(imread('kmyxE.png'));
J1 = I;
J2 = I;

%Morphological closing with kernel size 1x3 (applyied only the area with the watermark).
J1(720:1450, 480:1260) = imclose(I(720:1450, 480:1260), ones(1, 3));

%Morphological closing with kernel size 3x1
J2(720:1450, 480:1260) = imclose(I(720:1450, 480:1260), ones(3, 1));

%Keep the minimum of J1 and J2
J = min(J1, J2);

Result:
enter image description here

Rotem
  • 30,366
  • 4
  • 32
  • 65
0

Since the OCR was not working properly when the text data was deleted, I processed the text so that it was not deleted, instead of leaving a few watermarks. I ended up using two nested for loop.

result:

enter image description here

code:

img = cv2.imread('masked.png')
img_bw = 255*(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) > 5).astype('uint8')
img_copy = np.copy(img_bw)

for x in range(1, 2146):
    for y in range(1, 1727):
        if img_bw[x][y] == 0 and \
                img_bw[x-1][y] == img_bw[x+1][y] == img_bw[x][y-1] == img_bw[x][y+1] == \
                img_bw[x-1][y-1] == img_bw[x-1][y+1] == img_bw[x+1][y-1] == img_bw[x+1][y+1] == 255:
            img_copy[x][y] = 255
youngminz
  • 1,364
  • 2
  • 14
  • 23