Deblur image with text to be recognized by OCR

Question

I have an image that is blurred:
This is a part of the business card and it is one of the frames taken by the camera and without proper focus.

The clear image looks like this I'm looking for a method that could give me an image of better quality, so that image could be recognized by OCR, but also should be quite fast. The image is not blurred too much (I think so) but isn't good for OCR. I tried:

different kinds of HPF,
Laplacian,
Canny detector,
combinations of morphological operations (opening, closing).

I also tried:

deconvolution with Wiener filter,
deconvolution and the Lucy-Richardson method.

But it was not easy to find the right PSF (Point Spread Function). These methods are considered effective, but not so fast enough. I also tried FFT and then IFFT with a Gaussian mask, but the results were not satisfactory. I'm looking for some kind of general method of deblurring images with text, not only this image. Could someone help me with this problem? I'll be grateful for any advice. I'm working with OpenCV 3 (C++ and sometimes Python).

http://www.rroij.com/open-access/fast-moving-vehicle-number-plate-detection.php?aid=43702 also seminal paper by Gull and Skilling (MaxEnt method) — jtlz2, May 02 '18 at 23:14
Also http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.458.2634&rep=rep1&type=pdf — jtlz2, May 02 '18 at 23:18
https://dsp.stackexchange.com/questions/tagged/blind-deconvolution — jtlz2, May 02 '18 at 23:26
@LuisFelipe - no. We were looking for something quite fast, but nothing from aforementioned methods matched our expectations. We decided to check sharpness of the image. If the sharpness is not good enough, user is informed about it and he should take another picture. — Artur, Aug 29 '19 at 10:17
You can also check [here](https://docs.opencv.org/master/de/d3c/tutorial_out_of_focus_deblur_filter.html) — Yunus Temurlenk, Nov 17 '20 at 12:08
I think he has already tried deconvolution with Wiener filter. — Innat, Nov 17 '20 at 13:13

Ali · Answer 1 · 2020-03-06T19:49:22.800

18

Are you aware of Blind deconvolution?

Blind deconvolution is a well-known technique in restoring astronomical images. This is specially useful for your application, where finding a PSF is difficult.

Here is one C++ implementation of this technique. This paper is also very related to what you are looking for. Here is a sample output of their algorithm:

edited Mar 06 '20 at 19:49

answered Mar 06 '20 at 18:22

Ali

1,023
1
9
31

That's some pretty insane reconstruction. Powered by CNN, huh? That's super cool, but I wonder about the resources needed (both computational power and complexity) to carry this processing out as close as real-time as possible. – stateMachine Mar 07 '20 at 02:24
@eldesgraciado Yes, I agree about the reconstruction. The blind deconvolution is an expensive algorithm and I doubt it is possible to do it in real time. However, it is possible to minimize computational complexity by carrying the task at frequency domain. – Ali Mar 09 '20 at 13:43
1

Does anyone have a tutorial on how I can implement this with python? – Jim O. Feb 24 '21 at 06:55
1

@JimO. - Did you ever get an answer? I'd be interested in that too. – Robert Oschler Dec 02 '21 at 04:55
1

@Robert Oschler I'm still praying – Jim O. Feb 02 '22 at 03:47

Innat · Answer 2 · 2020-11-19T11:11:26.523

I've also encountered this issue recently and raise a similar question with more details and with a recent approach. It seems to be an unsolved problem until now. There are some recent research works that try to address such problems with deep learning. Unfortunately, none of the works reach our expectations. However, I'm sharing the info in case it may come helpful to anyone.

1. Scene Text Image Super-Resolution in the Wild

In our case, it may be our last choice; comparatively, perform well enough. It's a recent research work (TSRN) mainly focuses on such cases. The main intuitive of it is to introduce super-resolution (SR) techniques as pre-processing. This implementation looks by far the most promising. Here is the illustration of their achievement, improve blur to clean image.

2. Neural Enhance

From their repo demonstration, It's appearing that It may have some potential to improve blur text either. However, the author probably doesn't maintain the repo for about 4 years.

3. Blind Motion Deblurring with GAN

The attractive part is the Blind Motion Deblurring mechanism in it, named DeblurGAN. It looks very promising.

4. Real-World Super-Resolution via Kernel Estimation and Noise Injection

An interesting fact about their work is that unlike other literary works they first design a novel degradation framework for realworld images by estimating various blur kernels as well as real noise distributions. Based on that they acquire LR images sharing a common domain with real-world images. Then, they propose a realworld super-resolution model aiming at better perception. From their article:

However, in my observation, I couldn't get the expected results. I've raised an issue on github and until now didn't get any response.

Convolutional Neural Networks for Direct Text Deblurring

The paper that was shared by @Ali looks very interesting and the outcomes are extremely good. It's nice that they have shared the pre-trained weight of their trained model and also shared python scripts for easier use. However, they've experimented with the Caffe library. I would prefer to convert into PyTorch to better control. Below are the provided python scripts with Caffe imports. Please note, I couldn't port it completely until now because of a lack of Caffe knowledge, please correct me if you are aware of it.

from __future__ import print_function
import numpy as np
import os, sys, argparse, glob, time, cv2, Queue, caffe

# Some Helper Functins 
def getCutout(image, x1, y1, x2, y2, border):
    assert(x1 >= 0 and y1 >= 0)
    assert(x2 > x1 and y2 >y1)
    assert(border >= 0)
    return cv2.getRectSubPix(image, (y2-y1 + 2*border, x2-x1 + 2*border), (((y2-1)+y1) / 2.0, ((x2-1)+x1) / 2.0))

def fillRndData(data, net):
    inputLayer = 'data'
    randomChannels = net.blobs[inputLayer].data.shape[1]
    rndData = np.random.randn(data.shape[0], randomChannels, data.shape[2], data.shape[3]).astype(np.float32) * 0.2
    rndData[:,0:1,:,:] = data
    net.blobs[inputLayer].data[...] = rndData[:,0:1,:,:]

def mkdirp(directory):
    if not os.path.isdir(directory):
        os.makedirs(directory)

The main function start here

def main(argv):
    pycaffe_dir = os.path.dirname(__file__)

    parser = argparse.ArgumentParser()
    # Optional arguments.
    parser.add_argument(
        "--model_def",
        help="Model definition file.",
        required=True
    )
    parser.add_argument(
        "--pretrained_model",
        help="Trained model weights file.",
        required=True
    )
    parser.add_argument(
        "--out_scale",
        help="Scale of the output image.",
        default=1.0,
        type=float
    )
    parser.add_argument(
        "--output_path",
        help="Output path.",
        default=''
    )
    parser.add_argument(
        "--tile_resolution",
        help="Resolution of processing tile.",
        required=True,
        type=int
    )
    parser.add_argument(
        "--suffix",
        help="Suffix of the output file.",
        default="-deblur",
    )
    parser.add_argument(
        "--gpu",
        action='store_true',
        help="Switch for gpu computation."
    )
    parser.add_argument(
        "--grey_mean",
        action='store_true',
        help="Use grey mean RGB=127. Default is the VGG mean."
    )
    parser.add_argument(
        "--use_mean",
        action='store_true',
        help="Use mean."
    )
    parser.add_argument(
        "--adversarial",
        action='store_true',
        help="Use mean."
    )
    args = parser.parse_args()

    mkdirp(args.output_path)

    if hasattr(caffe, 'set_mode_gpu'):
        if args.gpu:
            print('GPU mode', file=sys.stderr)
            caffe.set_mode_gpu()
        net = caffe.Net(args.model_def, args.pretrained_model, caffe.TEST)
    else:
        if args.gpu:
            print('GPU mode', file=sys.stderr)
        net = caffe.Net(args.model_def, args.pretrained_model, gpu=args.gpu)


    inputs = [line.strip() for line in sys.stdin]

    print("Classifying %d inputs." % len(inputs), file=sys.stderr)


    inputBlob = net.blobs.keys()[0] # [innat]: input shape 
    outputBlob = net.blobs.keys()[-1]

    print( inputBlob, outputBlob)
    channelCount = net.blobs[inputBlob].data.shape[1]
    net.blobs[inputBlob].reshape(1, channelCount, args.tile_resolution, args.tile_resolution)
    net.reshape()

    if channelCount == 1 or channelCount > 3:
        color = 0
    else:
        color = 1

    outResolution = net.blobs[outputBlob].data.shape[2]
    inResolution = int(outResolution / args.out_scale)
    boundary = (net.blobs[inputBlob].data.shape[2] - inResolution) / 2

    for fileName in inputs:
        img = cv2.imread(fileName, flags=color).astype(np.float32)
        original = np.copy(img)
        img = img.reshape(img.shape[0], img.shape[1], -1)
        if args.use_mean:
            if args.grey_mean or channelCount == 1:
                img -= 127
            else:
                img[:,:,0] -= 103.939
                img[:,:,1] -= 116.779
                img[:,:,2] -= 123.68
        img *= 0.004

        outShape = [int(img.shape[0] * args.out_scale) ,
                    int(img.shape[1] * args.out_scale) ,
                    net.blobs[outputBlob].channels]
        imgOut = np.zeros(outShape)

        imageStartTime = time.time()
        for x, xOut in zip(range(0, img.shape[0], inResolution), range(0, imgOut.shape[0], outResolution)):
            for y, yOut in zip(range(0, img.shape[1], inResolution), range(0, imgOut.shape[1], outResolution)):

                start = time.time()

                region = getCutout(img, x, y, x+inResolution, y+inResolution, boundary)
                region = region.reshape(region.shape[0], region.shape[1], -1)
                data = region.transpose([2, 0, 1]).reshape(1, -1, region.shape[0], region.shape[1])

                if args.adversarial:
                    fillRndData(data, net)
                    out = net.forward()
                else:
                    out = net.forward_all(data=data)

                out = out[outputBlob].reshape(out[outputBlob].shape[1], out[outputBlob].shape[2], out[outputBlob].shape[3]).transpose(1, 2, 0)

                if imgOut.shape[2] == 3 or imgOut.shape[2] == 1:
                    out /= 0.004
                    if args.use_mean:
                        if args.grey_mean:
                            out += 127
                        else:
                            out[:,:,0] += 103.939
                            out[:,:,1] += 116.779
                            out[:,:,2] += 123.68

                if out.shape[0] != outResolution:
                    print("Warning: size of net output is %d px and it is expected to be %d px" % (out.shape[0], outResolution))
                if out.shape[0] < outResolution:
                    print("Error: size of net output is %d px and it is expected to be %d px" % (out.shape[0], outResolution))
                    exit()

                xRange = min((outResolution, imgOut.shape[0] - xOut))
                yRange = min((outResolution, imgOut.shape[1] - yOut))

                imgOut[xOut:xOut+xRange, yOut:yOut+yRange, :] = out[0:xRange, 0:yRange, :]
                imgOut[xOut:xOut+xRange, yOut:yOut+yRange, :] = out[0:xRange, 0:yRange, :]

                print(".", end="", file=sys.stderr)
                sys.stdout.flush()


        print(imgOut.min(), imgOut.max())
        print("IMAGE DONE %s" % (time.time() - imageStartTime))
        basename = os.path.basename(fileName)
        name = os.path.join(args.output_path, basename + args.suffix)
        print(name, imgOut.shape)
        cv2.imwrite( name, imgOut)

if __name__ == '__main__':
    main(sys.argv)

To run the program:

cat fileListToProcess.txt | python processWholeImage.py --model_def ./BMVC_nets/S14_19_200.deploy --pretrained_model ./BMVC_nets/S14_19_FQ_178000.model --output_path ./out/ --tile_resolution 300 --suffix _out.png --gpu --use_mean

The weight files and also the above scripts can be download from here (BMVC_net). However, you may want to convert caffe2pytorch. In order to do that, here is the basic starting point:

install proto-lens
clone caffemodel2pytorch

Next,

# BMVC_net, you need to download it from authors website, link above
model = caffemodel2pytorch.Net(
    prototxt = './BMVC_net/S14_19_200.deploy', 
    weights = './BMVC_net/S14_19_FQ_178000.model',
    caffe_proto = 'https://raw.githubusercontent.com/BVLC/caffe/master/src/caffe/proto/caffe.proto'
)

model.cuda()
model.eval()
torch.set_grad_enabled(False)

Run-on a demo tensor,

# make sure to have right procedure of image normalization and channel reordering
image = torch.Tensor(8, 3, 98, 98).cuda()

# outputs dict of PyTorch Variables
# in this example the dict contains the only key "prob"
#output_dict = model(data = image)

# you can remove unneeded layers:
#del model.prob
#del model.fc8

# a single input variable is interpreted as an input blob named "data"
# in this example the dict contains the only key "fc7"
output_dict = model(image)
# print(output_dict)
print(output_dict.keys())

Please note, there are some basic things to consider; the networks expect text at DPI 120-150, reasonable orientation, and reasonable black and white levels. The networks expect to mean [103.9, 116.8, 123.7] to be subtracted from inputs. The inputs should be further multiplied by 0.004.

Hello, have you been actually test the method @Ali shared on a real image? First, on the original page Python script is for Python 2.7 and there's a lot to change, two, I haven't been able to run the code you've provided due to this error: "module 'caffemodel2pytorch' has no attribute 'Net'" — Kerem Nayman, Nov 27 '20 at 19:36
I'm so lost with Caffe implementation. Any help would be appreciated. — Jim O., Feb 20 '21 at 08:38
I didn't further go with that Caffe implementation, it has many limitations. Check [my question](https://stackoverflow.com/questions/64808986/scene-text-image-super-resolution-for-ocr) on the **update 2** section. In my case, surprisingly [this mechanism](https://github.com/microsoft/Bringing-Old-Photos-Back-to-Life) improves the visual quality of the text image as well. — Innat, Feb 20 '21 at 08:42
During my experimenting time, that mechanism didn't publish their training code, and so if they are now, we can re-train their model on this specific text de-blur case. A [discussion](https://github.com/microsoft/Bringing-Old-Photos-Back-to-Life/issues/57). — Innat, Feb 20 '21 at 08:45
Thanks. I'll probably forgo the RealSR because I couldn't install CUDA on my 2013 Macbook Air. The MSFT one seems plausible. I've had some luck with ``https://github.com/ys-koshelev/nla_deblur`` and DPSR. — Jim O., Feb 20 '21 at 09:06