1

I have a copy of scanned ancient dirty document and synthetically generated ancient dirty documents. I want to use Discrete Cosine Transform to characterize noise types existing in the document and identify if the text in the document originally there or just synthetically overlaid.

The DCT coefficients are normally classified into three sub-bands based on their frequencies, namely low, middle and high frequency-bands. The low frequency can be associated with the essence of the text, while the high frequency is related with background noise. I can characterize noise types (spots, uneven background and etc.) by getting and classifying the standard deviation of the high frequency DCT. But my question is, is it possible to identify if the text in the document originally there or just synthetically overlaid by analyzing the low frequency data of the DCT? If no, what other transformation functions can I use to distinguish that?

Thanks. Any help would be very much appreciated.

alyssaeliyah
  • 2,214
  • 6
  • 33
  • 80
  • Could you post examples. I presume a small cutout (3 or 4 lines) would suffice. The FFT gives you all frequencies available in your scan. You should look at the (2D) spectra and try to identify at what spatial frequencies (coordinates in the spectrum) are there differences. – roadrunner66 Feb 04 '20 at 22:42
  • 1
    See here how to transform an Image : https://stackoverflow.com/questions/58936938/fourier-transform-in-python-giving-blank-images/58964525#58964525. Here a synthetic (fake data) example on how you can see low and high frequencies in the spectra of images: https://stackoverflow.com/questions/58392838/how-to-get-information-about-sharpness-of-image-with-fourier-transformation/58403447#58403447 – roadrunner66 Feb 04 '20 at 22:44

0 Answers0