2

I am using Microsoft OCR Library for reading text.

The Microsoft OCR library works perfectly. However i want to read the following list of characters given in the link http://www.ict4u.net/databases/database-images/micr.jpg . Is there a way in which i can train the OCR library to read the following characters or is there a language that allows to read the following characters.

Cloy
  • 2,141
  • 23
  • 32

2 Answers2

2

[Microsoft OCR crew here] We don't yet support training OCR to customize it for your use-cases. However, we do actively keep an eye on stackoverflow to see what developers need, so we can keep improving the OCR engine.

  • Out of the 25 languages that are supported does any of the language recognizes these http://www.ict4u.net/databases/database-images/micr.jpg fonts? – Cloy Aug 11 '16 at 09:21
  • @Cornelia: OK. If you want to improve the OCR engine, there are several things you can do. 1.) Output the text in the correct order. This means from the top left to the bottom right and not all words in random order. And group the words to correct lines. 2.) Why does OCR not recognize asterisks correctly? E.g. a text like "***123" is not recongized at all. After removing the asterisks the "123" is recognized suddenly. 3.) Why is sometimes a single character recognized correctly and at another place the same character is omitted? If you like I can send you sample images with the wrong results. – Elmue Aug 17 '16 at 02:48
  • 1
    No response. I see that Microsoft is not interested to improve it's products. – Elmue Sep 07 '16 at 13:43
1

I have been working with Microsoft OCR for a while now. Compared with Tesseract it has very basic functionality.

For example Microsoft OCR returns the words and lines. But the lines are nonsense. Randomly 2 or 3 words are grouped together as a "line" but they are not a real line. And the "lines" are completely unordered. In this aspect it is worse than Tesseract. You have to take the coordinates of each word and order them on your own.

Microsoft does not return the rectangles of characters and there is absolutely no way to configure or train Microsoft OCR in any way. You can add languages with Windows Update for "Basic Typing" = OCR (see http://www.thewindowsclub.com/install-uninstall-languages-windows-10), but you cannot train your own language data.

MSDN says that the following 25 languages are supported with different accuracy:

  • Excellent: Czech, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Serbian Cyrillic, Serbian Latin, Slovak, Spanish and Swedish.
  • Very good: Chinese Simplified, Greek, Japanese, Russian and Turkish.
  • Good: Chinese Traditional and Korean.

The recognition quality is very similar to Tesseract. It has even exactly the same problems as Tesseract. Some single characters are not recognized (separate symbols like a single '$') and it has the same huge problem with asterisks as Tesseract. Also does it insert spaces at the wrong places as Tesseract does. So I ask myself if Microsoft is using Tesseract under the hood?

However Microsoft OCR has an advantage over Tesseract: The image preprocessing is much better. It does not matter if you have red text on yellow background or white text on black. This is a catch for Tesseract which needs a black and white image of good quality as input.

For both OCR libraries applies: If you have recognition problems, try to amplify the image. Even blurring the image may be very helful because this removes the noise from the image.

Elmue
  • 7,602
  • 3
  • 47
  • 57
  • My main aim is to read only the bottom part of a cheque http://blog.elearnmarkets.com/wp-content/uploads/2016/01/Self-cheque-1024x460.jpg that contains the MICR band with http://www.ict4u.net/databases/database-images/micr.jpg the following font on windows phone. Should i use tesseract or is there a way to do it in Microsoft OCR? – Cloy Aug 11 '16 at 09:26
  • Why do you ask so many questions? Simply try it! But I'm sure that a simple OCR engine will NOT recognize MICR fonts. What do you think how many percent of the Windows users want to read MICR fonts apart from you? And who is still working with obsolete cheques in 2016? We are living in the age of electronic payment. Cheques are dead since decades. I did not use a cheque for about 20 years. And why do you want to read the MICR code at all? Why don't you read the plain text on the cheque? – Elmue Aug 17 '16 at 02:28