6

I'm working on denoising dirty image document. I want to create a dataset wherein synthetic noise will be added to simulate real-world, messy artifacts. Simulated dirt may include coffee stains, faded sun spots, dog-eared pages, lot of wrinkles and many more. How shall I do that?

Sample Clean Image :

enter image description here

After Adding Synthetic Noise:

enter image description here enter image description here enter image description here enter image description here

How can I randomly achieve images shown above?

mrk
  • 8,059
  • 3
  • 56
  • 78
alyssaeliyah
  • 2,214
  • 6
  • 33
  • 80
  • 2
    which is your original image ? and the text doesnt seem to deform with the paper deformation which is highly unlikely right ? – venkata krishnan Nov 20 '19 at 08:33
  • @venkatakrishnan -> Please see my updated post :) – alyssaeliyah Nov 20 '19 at 08:53
  • 1
    Maybe you're looking for [this](https://github.com/jrosebr1/bat-country)? – nathancy Nov 20 '19 at 21:22
  • 1
    along with @nathancy answer, you can just create a simple filter with opencv, to mask out the black pixels (the text), add some augmentations as mentioned by nathan and put the text back. You can write it as a custom augmentation function in keras to auto generate the images randomly. Or you can generate randomly by yourself and use the whole set. – venkata krishnan Nov 21 '19 at 00:57

3 Answers3

4

You can accomplish this with the Augraphy library. Disclosure: I'm a maintainer on the project.

Clean images can be overlaid onto different paper textures, stained, marked with pencil or highlighter, folded, and so on. We support a ton of different augmentations, and each offers a lot of control over the degree of effect. I recently wrote a brief intro to the library that has a couple of sample images, and there's a post here about how to set up an Augraphy pipeline to generate a wide range of these effects.

Here's an example pipeline, using the high quality version of your clean image from the NoisyOffice dataset. This pipeline will produce images that:

  1. have been printed with a printing press onto a different kind of paper,
  2. have a few pencil lines drawn through some words, and
  3. have holes punched through.
from augraphy import *
import cv2


img = cv2.imread("Fontfre_Clean_TR.png")

ink = [Letterpress(layer="ink", p=1),
       Strikethrough(layer="ink",
                     num_lines_range=(2, 7),
                 strikethrough_length_range=(0.2, 0.4),
                 strikethrough_thickness_range=(1, 2),
                     p=1)]

paper = [PaperFactory(p=1)]

post = [BindingsAndFasteners(layer="post",
                             ntimes=5,
                             effect_type="punch_holes",
                             edge="left",
                             p=1)]

pipeline = AugraphyPipeline(ink,paper,post)

complete = pipeline.augment(img)

cv2.imshow("augmented", complete['output'])
cv2.waitKey(1000)

And here's the result.

Feel free to post a GitHub Issue if you need any help or would like to make a suggestion.

3

S C R A P E - B A C K G R O U N D - I M A G E S

In my opinion the obvious way to introduce real-world noise is by introducing real world noise. Thus you could scrape the web for paper backgrounds (exemplary link): Searching for:

  1. paper background
  2. dirty paper background
  3. stained paper background

Should do the trick.

Depending on how many different patterns you need, you might want to scrape the web automated (selenium a python package has your back).

O V E R L A Y - B A C K G R O U N D - W I T H - Y O U R - T E X T

Next depending on your programming language of choice, you should be able to overlay your background image with an image of the text you want to augment for python and opencv this is in depth covered here on SO.

As a final touch you could additionally use the Augmentor package to further enhance and augment your data.

mrk
  • 8,059
  • 3
  • 56
  • 78
1

It is suggested to merge clean images with the noise backgrounds. Although this method doesn't support paper deformations.

The following code may help:

import numpy as np
import cv2

# Load both clean and noisey background images in grayscale
img_clean = cv2.imread('img_clean1.jpg',0)
img_bg = cv2.imread('img_noisy_bg1.jpg',0)

# Make clean image binary
img_clean , thr = cv.threshold(img_clean , 10, 255, cv.THRESH_BINARY)

# Blend clean with the noisy BG
res = cv2.bitwise_and(img_bg, img_bg, mask=img_clean)

# Make it more natural!
res = cv.blur(res, (3,3))

cv2.imshow('image', res)
cv2.waitKey(0)
ma.mehralian
  • 1,274
  • 2
  • 13
  • 29