I have a copy of scanned ancient dirty document and synthetically generated ancient dirty documents. I want to use Discrete Cosine Transform to characterize noise types existing in the document and identify if the text in the document originally there or just synthetically overlaid.
The DCT coefficients are normally classified into three sub-bands based on their frequencies, namely low, middle and high frequency-bands. The low frequency can be associated with the essence of the text, while the high frequency is related with background noise. I can characterize noise types (spots, uneven background and etc.) by getting and classifying the standard deviation of the high frequency DCT. But my question is, is it possible to identify if the text in the document originally there or just synthetically overlaid by analyzing the low frequency data of the DCT? If no, what other transformation functions can I use to distinguish that?
Thanks. Any help would be very much appreciated.