OCR .NET Recommended

Question

hi i have just tried a component for OCR in .NET and the results were pretty inacurrate. has anybody else been down this route before? can you please recommend a path for me to save me the time of evaluating lots of components that produce not satisfactory results.

any recommendations much appreciated. i dont mind buying in or coding myself. whatever works best and is cost effective.

thanks

duplicate: http://stackoverflow.com/questions/591574/ocr-in-net — Mauricio Scheffer, Jun 11 '09 at 19:45

score 1 · Answer 1 · answered Apr 08 '09 at 08:45

1

We have used Abby Fine Reader SDK in our project. It comes with a COM object which you can use in your .Net application. Success of the engine is good enough.

answered Apr 08 '09 at 08:45

idursun

6,261
1
37
51

Same for us. The API has some pitfalls, but the results are really amazing. – Dirk Vollmar Apr 08 '09 at 08:51

score 0 · Answer 2 · edited May 23 '17 at 12:29

See https://stackoverflow.com/a/18070183/852208 for info on an alternative engine.

It's possible that your accuracy issue is related to the library itself. However, it's more likely the image source that you're working with. Consider the following tips:

Textual considerations

Standard OCR should not be attempted on certain materials. For example, currently OCR with default settings should not be attempted
on most texts published prior to 1850. For some languages (e.g.,
German) the cutoff date may be even later. Before trying to create
transcriptions for these materials via OCR, detailed analysis and
often experimentation is required to judge trade-offs between custom
OCR and keyboarding options.

Older and discolored documents must be scanned in RGB mode to capture all the image data, and to maximize OCR accuracy.

Low-contrast documents can result in poor OCR.

Typescript results in poorer OCR than printed type; inconsistent use of font faces and sizes can lower OCR accuracy.

Font sizes of below 6 points in the original can limit OCR, although increasing resolution in the scanned image to 600 dpi and using greyscale may improve OCR output.

Handwritten documents cannot be recognized with any degree of accuracy.

Scanning considerations that affect the accuracy of OCR include:

The recommended best scanning resolution for OCR accuracy is 300 dpi. Higher resolutions do not necessarily result in better accuracy and can slow down OCR processing time. Resolutions below 300 dpi may affect the quality and accuracy of OCR results.

Brightness settings that are too high or too low may adversely affect OCR accuracy. A medium brightness value of 50% will be suitable in most cases.

Straightness of the initial scan can affect OCR quality; crooked lines of text produce poor results.

Older and discolored documents must be scanned in RGB mode to capture all the image data, and to maximize OCR accuracy.

Image enhancements, such as contrast adjustment and unsharp mask, have NOT been shown to significantly enhance the accuracy of OCR.

score 0 · Answer 3 · answered Apr 08 '09 at 08:52

ABBYY's component is pretty expensive. I've evaluated Pegasus ImagXpress and Atalasoft DotImage and while I've found DotImage more accurate on full-page OCR, certain small portions of text which was difficult to recognize were better read by ImagXpress. I suggest you try demo versions of both and see what fits best for your needs.

OCR .NET Recommended

3 Answers3

Linked