6

Is there any effective way exists to detect and extract only the handwritten part from a noisy image containing both handwritten and machine printed texts? The image is attached below. https://i.stack.imgur.com/yN2Do.jpg

  • Possible duplicate of [opencv - cropping handwritten lines (line segmentation)](https://stackoverflow.com/questions/46282691/opencv-cropping-handwritten-lines-line-segmentation) – S. M. Shahinul Islam Mar 25 '18 at 17:23
  • I am asking for differentiation between Handwritten and machine printed. That referred question is something different. @s-m-shahinul-islam –  Mar 25 '18 at 17:42
  • what are the possible fonts of machine printed texts? or it might be any thing? – Krishna Mar 25 '18 at 18:10
  • 2
    As with any question about image processing, you must provide sample images either embedded or as links. As your question is currently posed, you haven't limited the universe of all images having both handwritten and machine-printed text to the subset likely of interest to you. Treat the formulation of this problem as an engineering task in and of itself--the more precise you can be, the better chance you have of solving the problem. Image processing systems, like other engineering products, succeed or fail in large part based on the quality of the initial engineering assessment. – Rethunk Mar 26 '18 at 00:57
  • I have attached the image. I have tried the bounding box concept, but it detects every text in the image rather than only the handwritten parts. @Rethunk https://imgur.com/a/vMJEC –  Mar 26 '18 at 06:47
  • Martin Thomas has given you a good start: given an initial text find/segmentation, use a classifier to distinguish between machine font and handwritten text. BUT if you can make a priori assumptions about the machine printed font, even if for just a few fields, you could have an advantage. Proper feeding and care of classifiers is an important task. – Rethunk Mar 26 '18 at 23:45
  • 1
    Did you arrive at a solution? I have a very similar requirement. – Arun Gowda Aug 20 '18 at 18:10

2 Answers2

1

You can see this as a detection problem: Detect (draw axis-aligned bounding boxes around) all characters which are machine printed.

The simplest way to do this is a sliding-window + a classifier:

  1. Crop a patch out of the image for which you want to know "is this a machine printed text"
  2. Apply a classifier which gets the patch as input and outputs a probability for "yes, it is printed text".

The classifier will likely be a CNN.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • As you can see my uploaded picture, this contains several mixture of handwritten and machine-printed texts in a single horizontal line, hence it is difficult to crop. @Martin Thoma –  Mar 26 '18 at 14:48
  • 1
    I'm not saying you should make a single crop. I'm saying make almost all crops and let the classifier do the work. – Martin Thoma Mar 26 '18 at 16:09
  • @cpanda: please accept @MartinThoma’s answer once you’ve looked into classifiers a bit. The answer is more than sufficient to point you in a direction worth exploring. – Rethunk Mar 26 '18 at 23:47
0

I guess you have images with same format structure as given image having contents in fixed format with known coordinates of Machine printed texts, You can use coordinate information to retrieve your texts categories.

As mentioned by @Rethunk you can also utilize the font information of machine printed texts to get more precise result.

flamelite
  • 2,654
  • 3
  • 22
  • 42