Randomly Generate Synthetic Noise in an Image Text Document

Question

I'm working on denoising dirty image document. I want to create a dataset wherein synthetic noise will be added to simulate real-world, messy artifacts. Simulated dirt may include coffee stains, faded sun spots, dog-eared pages, lot of wrinkles and many more. How shall I do that?

Sample Clean Image :

After Adding Synthetic Noise:

How can I randomly achieve images shown above?

which is your original image ? and the text doesnt seem to deform with the paper deformation which is highly unlikely right ? — venkata krishnan, Nov 20 '19 at 08:33
Maybe you're looking for [this](https://github.com/jrosebr1/bat-country)? — nathancy, Nov 20 '19 at 21:22
along with @nathancy answer, you can just create a simple filter with opencv, to mask out the black pixels (the text), add some augmentations as mentioned by nathan and put the text back. You can write it as a custom augmentation function in keras to auto generate the images randomly. Or you can generate randomly by yourself and use the whole set. — venkata krishnan, Nov 21 '19 at 00:57

score 4 · Answer 1 · answered Oct 05 '21 at 09:15

You can accomplish this with the Augraphy library. Disclosure: I'm a maintainer on the project.

Clean images can be overlaid onto different paper textures, stained, marked with pencil or highlighter, folded, and so on. We support a ton of different augmentations, and each offers a lot of control over the degree of effect. I recently wrote a brief intro to the library that has a couple of sample images, and there's a post here about how to set up an Augraphy pipeline to generate a wide range of these effects.

Here's an example pipeline, using the high quality version of your clean image from the NoisyOffice dataset. This pipeline will produce images that:

have been printed with a printing press onto a different kind of paper,
have a few pencil lines drawn through some words, and
have holes punched through.

from augraphy import *
import cv2


img = cv2.imread("Fontfre_Clean_TR.png")

ink = [Letterpress(layer="ink", p=1),
       Strikethrough(layer="ink",
                     num_lines_range=(2, 7),
                 strikethrough_length_range=(0.2, 0.4),
                 strikethrough_thickness_range=(1, 2),
                     p=1)]

paper = [PaperFactory(p=1)]

post = [BindingsAndFasteners(layer="post",
                             ntimes=5,
                             effect_type="punch_holes",
                             edge="left",
                             p=1)]

pipeline = AugraphyPipeline(ink,paper,post)

complete = pipeline.augment(img)

cv2.imshow("augmented", complete['output'])
cv2.waitKey(1000)

And here's the result.

Feel free to post a GitHub Issue if you need any help or would like to make a suggestion.

score 3 · Accepted Answer · answered Nov 28 '19 at 11:51

S C R A P E - B A C K G R O U N D - I M A G E S

In my opinion the obvious way to introduce real-world noise is by introducing real world noise. Thus you could scrape the web for paper backgrounds (exemplary link): Searching for:

paper background
dirty paper background
stained paper background

Should do the trick.

Depending on how many different patterns you need, you might want to scrape the web automated (selenium a python package has your back).

O V E R L A Y - B A C K G R O U N D - W I T H - Y O U R - T E X T

Next depending on your programming language of choice, you should be able to overlay your background image with an image of the text you want to augment for python and opencv this is in depth covered here on SO.

As a final touch you could additionally use the Augmentor package to further enhance and augment your data.

score 1 · Answer 3 · answered Nov 26 '19 at 14:04

It is suggested to merge clean images with the noise backgrounds. Although this method doesn't support paper deformations.

The following code may help:

import numpy as np
import cv2

# Load both clean and noisey background images in grayscale
img_clean = cv2.imread('img_clean1.jpg',0)
img_bg = cv2.imread('img_noisy_bg1.jpg',0)

# Make clean image binary
img_clean , thr = cv.threshold(img_clean , 10, 255, cv.THRESH_BINARY)

# Blend clean with the noisy BG
res = cv2.bitwise_and(img_bg, img_bg, mask=img_clean)

# Make it more natural!
res = cv.blur(res, (3,3))

cv2.imshow('image', res)
cv2.waitKey(0)

Do you have any idea on how to deform a text based on the paper deformation (e.g wavy) ? — alyssaeliyah, Feb 02 '20 at 08:39

Randomly Generate Synthetic Noise in an Image Text Document

3 Answers3