3

I'm developing for UWP, Windows has an OCR engine: Windows.Media.Ocr

My question is: someone knows if the Windows OCR can be trained to recognize new characters or use a custom font? if yes, how i can do this?

what i want to achieve is to recognize non alphabetical symbols. I want to recognize per example the character ⌰ (unicode: U+2330) or ⌖ (U+2316).

The characters that i want to recognize are symbols not for any language.

Jay Zuo
  • 15,653
  • 2
  • 25
  • 49
Xaren
  • 491
  • 1
  • 4
  • 15

2 Answers2

3

I used Windows.Media.Ocr library in my WUP application and here some test result with different font


Arial

Font - Arial
Test Words - Hello @ World
Expected Result - Hello @ World
Original Result - Hello @ World
Accuracy - 100%

enter image description here


Agency FB

Font - Agency FB
Test Words - Hello @ World
Expected Result - Hello @ World
Original Result - Hello World
Accuracy - 84.6% (Missed - @ symbol and one space)

enter image description here


Modern

Font - Modern
Test Words - Hello @ World
Expected Result - Hello @ World
Original Result - Hello @ world
Accuracy - 92.3% (W recognised as w)

enter image description here


Lucida Handwriting

Font - Lucida Handwriting Test Words - Hello @ World
Expected Result - Hello @ World
Original Result - HeUe@ worw
Accuracy - 46.1%
enter image description here


Update [1]

Arial Unicode MS

Font - Arial Unicode MS
Test Symbols - ⌰ ⌖
Expected Result - ⌰ ⌖
Original Result - (Unable to Recognize)
Accuracy - 0%

enter image description here


Update 2

enter image description here

Hope this helpful to you.

Community
  • 1
  • 1
Vineet Choudhary
  • 7,433
  • 4
  • 45
  • 72
  • I think that you didn't exactly answer my question. But you give me ideas and lead me to other possible ways to solve my problem. Thanks. So, what i want to achieve is to recognize non alphabetical symbols. I want to recognize per example the character ⌰ (unicode: U+2330) or ⌖ (U+2316). Can you please tell me if in your example with Windows OCR you are able to recognize this Miscellaneous Technical Unicode subset characters. I will try it tomorrow. Thanks in advance – Xaren Mar 08 '16 at 08:05
  • @Xaren please check the Update 1 section of my answer. Hope this helpful to you – Vineet Choudhary Mar 08 '16 at 09:13
  • Thanks for your help. Can you suggest to me a way to recognize special characters – Xaren Mar 08 '16 at 17:06
  • Special Characters like ⌰ & ⌖ OR @,#,%,^ etc?? – Vineet Choudhary Mar 08 '16 at 17:09
  • Characters like ⌰ & ⌖ – Xaren Mar 08 '16 at 17:38
  • in addition to this, windows media OCR is useless once you want to detect the user scribbled notes. I'd really like to see an example on how to recognize the user entered text (via touch) with some acceptable success rate. see http://stackoverflow.com/questions/35954842/windows-ocr-engine-fails-to-recognize-the-text-in-canvas-converted-to-bitmap/35964300#35964300 – belzebu Mar 12 '16 at 23:43
  • Hey @Vineet Choudhary , what about the following font........ ict4u.net/databases/database-images/… could i recognize the following font. or can you take a look at this http://stackoverflow.com/questions/38824278/add-new-language-for-ocr-engine – Cloy Aug 09 '16 at 09:18
  • @CloyMonis please check update 2, there is currently no way to train the OCR library with custom data set – Vineet Choudhary Aug 24 '16 at 12:11
2

I think a short answer to your question is no. As it is said in Supported languages sections in Windows.Media.Ocr namespace:

There are 25 supported languages. Based on recognition accuracy and performance, supported languages are divided into three groups:

  • Excellent: Czech, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Serbian Cyrillic, Serbian Latin, Slovak, Spanish and Swedish.
  • Very good: Chinese Simplified, Greek, Japanese, Russian and Turkish.
  • Good: Chinese Traditional and Korean.

The language is required information for correct text recognition. Every language uses some language-specific resources, so it must be specified in advance.

Note Only languages installed on the device can be used. A user can install new languages through the Settings app.

So if your symbols are not for any language, the OCR engine won't recognize it.

And for custom font, As Vineet Choudhary's answer shows, maybe the OCR engine can recognize some, the accuracy of text recognition depends on your font. If it's handwritten or cursive text, the accuracy of text recognition may be very low.

Jay Zuo
  • 15,653
  • 2
  • 25
  • 49
  • Hey @Jay Zuo - MSFT , what about the following font........ http://www.ict4u.net/databases/database-images/micr.jpg.................How could i recognize the following font. – Cloy Aug 09 '16 at 09:11