Virtual Generation of Synthetic Ancient and Dirty English Documents

Question

I have a collection of dirty background image, below is the sample:

I have also a collection of an actual image of a dirty document with text on it, just like below:

My problem with my actual image of dirty documents with text, is that the text is not written in English alphabet and are handwritten. So, my task is to create old documents with English text printed on it. All I have to do is to overlay any English text to a blank dirty document.

After overlaying, my task is to find a measure which shows that the synthetically generated dirty document with English text is almost the same with my collection of original dirty documents with not-English text. Maybe I can compare their histograms and whatsoever, to find their commonality. Help me find that measure. The purpose of finding that measure is to ensure that synthetically generated documents have the same quality as the original ones within a specific range of threshold.

I will use the synthetically generated documents as the data for my research, and I need to ensure that the data I will be using is as good as the original one.

Mark Setchell · Answer 1 · 2020-01-23T09:37:16.053

Not sure if your question is about generating the documents or analysing them, but I played around a little with generating documents and thought I might as well share what I have done - both for fun and for my own reference.

I used ImageMagick at the command-line. It is included in most Linux distros and is available for macOS and Windows.

I did another, somewhat related answer about synthetic ageing of photographs here. Along these lines:

So, I grabbed the "Olde English" font from here and installed it for ImageMagick to use as shown here.

I grabbed some nonsense English text from the Nietzsche Ipsum because the "Lorem Ipsum" is in latin. Feigning intelligence, I saved that in a file called wisdom.txt:

Derive oneself good inexpedient derive ideal society. Mountains burying prejudice prejudice endless transvaluation contradict evil endless right. Burying transvaluation selfish passion overcome suicide contradict insofar madness spirit strong enlightenment suicide. Ubermensch fearful right god sexuality madness truth against superiority salvation.

Pinnacle faithful ascetic evil society marvelous will ultimate play christianity noble spirit good. Burying faithful war prejudice justice contradict of. Morality moral enlightenment gains zarathustra superiority joy war. Christianity value reason strong ideal. Deceptions justice god suicide battle of. Christian decieve abstract society revaluation derive ultimate joy.

Right morality grandeur value decieve. Revaluation christianity endless derive endless morality. Hatred of deceptions suicide snare pinnacle overcome society suicide ideal. Transvaluation christian pinnacle ultimate faith war ubermensch noble strong insofar prejudice abstract morality. Prejudice ascetic gains horror strong good against intentions snare.

Deceptions moral madness free inexpedient holiest convictions morality. Pious abstract moral christian deceptions overcome sexuality hope horror inexpedient. Against spirit.

I then saved your "dirty document" as papyrus.jpg and ran the following ImageMagick command in Terminal:

magick papyrus.jpg -size 360x600 -background none -font OldeEnglish -pointsize 20 -fill '#555' caption:@wisdom.txt -gravity center -compose multiply -composite result.png

And here is the result:

Basically I am generating the text in dark grey (-fill '#555') into an area a little smaller than the paper (-size 360x600) on a transparent background (-background none) and then centering it (-gravity center) and compositing it (-compose multiply -composite) onto the background.

You could do other things, like:

distress the text with noise before compositing it onto the background
setting it out in a two-column spread using Pango
distorting it into a slightly wavy form

but the basic idea is here and anyone who has the time can develop it further.

Keywords: Image processing, ancient text, manuscript, olde English, papyrus, Lorem Ipsum, distress, medieval, document, aged, synthetic ageing.

I want to generate synthetic images and analyze and measure if they are as good as the actual original dirty ancient document. — alyssaeliyah, Jan 30 '20 at 07:07

score 2 · Answer 2 · answered Feb 01 '20 at 15:54

I think alpha blending is a good way to overlay the text on the background. With the help of Ipsum maker that Mark Setchell mentioned, I made an image of a script, and blended it with the background. For better implementation I resize the background image to the size of the script image. I read the script image, reversed it and use that image to mask the non-overlaying parts of the background. Then multiplied the overlaying parts with an alpha factor and add to the other image. Below is the result of synthesizing:

And here is the c++ opencv code:

Mat sc = imread("script.jpg", 1);
Mat bg = imread("BG.jpg", 1);

resize(bg, bg, sc.size());

sc.convertTo(sc, CV_32F);
bg.convertTo(bg, CV_32F);

sc /= 255.0;
bg /= 255.0;

imshow("0", sc);

Mat sc_r;
subtract(1, sc, sc_r);


Mat bgsc;
multiply(bg, sc, bgsc);

imshow("1", bgsc);

Mat bgsc_r;
multiply(bg, sc_r, bgsc_r);

float alpha = 0.3;

multiply(bgsc_r, alpha, bgsc_r);

imshow("2", bgsc_r);
Mat fin;
add(bgsc, bgsc_r, fin);

imshow("3", fin);   waitKey(0);

Virtual Generation of Synthetic Ancient and Dirty English Documents

2 Answers2