16

I have been reviewing replacements for the Office 2007 MODI OCR (OneNote's 2010 solution has lesser quality/results than 2007 :-( ). I notice that Windows 7 contains an OCR library once you install the optional tiff filter

The OCR component gets installed to

%programfiles%\Common Files\microsoft shared\OCR\7.0\xocr3.psp.dll 

but I don't see any API for it?

Does anyone see how this can be interfaced preferably in C#?

ANSWER: Found the soluation, once the optional tiff ifilter win7 feature is installed, i can then get a textoutput of a screenshot using the code/exe on http://www.codeproject.com/KB/cs/IFilter.aspx. Also if add the same [HKEY_CLASSES_ROOT.tiff\PersistentHandler] for .png and .jpg then OCR also works for jpg and png's.

Cœur
  • 37,241
  • 25
  • 195
  • 267
slyi
  • 321
  • 1
  • 2
  • 7
  • 4
    The main API seems defined in thocrapi.dll. But good luck with programming against an undocument, possibly changing target. You'd better spend some money and get a commercial library instead of wasting your time here. – Dirk Vollmar May 23 '11 at 17:35

3 Answers3

4
  1. Tessnet OCR is a good solution, but pretty old (last release from 2009). There are couple of very good free OCR solutions available for .NET:
  2. Asprise C# OCR SDK. Very good and fast one.
  3. Microsoft Research Project Hawaii Web-based (cloud) OCR solution with full docs and samples (discontinued 2013)
  4. Bing OCR Web based (cloud) OCR replacement for above. (discontinued March 2014)
Community
  • 1
  • 1
Piotr Szmyd
  • 13,371
  • 6
  • 44
  • 61
  • taught me something, I only knew about tessnet (on my budget, at least ;-). Forewarning: It doesn't look like you can distribute software with the 'free' Aprise OCR - so 'free' probably belongs in quotes: http://asprise.com/product/ocr/faq.php?lang=vb – FastAl Jun 13 '11 at 14:07
  • Havent' read their license fully, as used that in non-commercial, personal project a while ago. Have to recheck it. There was also a third library I used the most. It was the nicest one, but forgot it's name. Will post you with info on that if I manage to find it somewhere:) – Piotr Szmyd Jun 13 '11 at 15:01
  • Piotr, thank you for mentioning Asprise [C# OCR Component Library](http://asprise.com/royalty-free-library/c%23-sharp.net-ocr-api-overview.html). Our dev team has spent a lot effort to improve the accuracy and speed of the OCR. Let us know if you have any suggestion to make it even better :) – Scanner.js Receipt Invoice OCR Mar 17 '15 at 01:47
  • At $5k for the cheapest version, Asprise appears to be anything but free: http://asprise.com/royalty-free-library/c%23-sharp.net-ocr-source-code-open-order.html – Jimmy Apr 26 '16 at 22:32
  • @Jimmy Well, there was one at the time of writing (there is a free trial now though). And OP didn't specifically ask for free libraries so I included it. – Piotr Szmyd Apr 27 '16 at 05:03
2

Try TessNet, using the suggestions I made to the Poster in this post (enlarge image, use separate process):
c# OCR can't recognize digits (tesseract 2)

Community
  • 1
  • 1
FastAl
  • 6,194
  • 2
  • 36
  • 60
1

I was exploring the windows 7 dlls and I found 3 libraries that might be useful: thocr.psp.dll ,xocr3.psp.dll, and ximage3b.dll. In this website and other similar websites I found out that ximage3b is a Windows system ocr engine. I have been looking for documentation online but I have not been succesful, but hey! at least I know that it's there, I will give you guys an update if I find out how to use it with C#/C/C++.

Jaime Ivan Cervantes
  • 3,579
  • 1
  • 40
  • 38