0

hi i have just tried a component for OCR in .NET and the results were pretty inacurrate. has anybody else been down this route before? can you please recommend a path for me to save me the time of evaluating lots of components that produce not satisfactory results.

any recommendations much appreciated. i dont mind buying in or coding myself. whatever works best and is cost effective.

thanks

3 Answers3

1

We have used Abby Fine Reader SDK in our project. It comes with a COM object which you can use in your .Net application. Success of the engine is good enough.

idursun
  • 6,261
  • 1
  • 37
  • 51
0

See https://stackoverflow.com/a/18070183/852208 for info on an alternative engine.

It's possible that your accuracy issue is related to the library itself. However, it's more likely the image source that you're working with. Consider the following tips:

Textual considerations

  • Standard OCR should not be attempted on certain materials. For example, currently OCR with default settings should not be attempted
    on most texts published prior to 1850. For some languages (e.g.,
    German) the cutoff date may be even later. Before trying to create
    transcriptions for these materials via OCR, detailed analysis and
    often experimentation is required to judge trade-offs between custom
    OCR and keyboarding options.
  • Older and discolored documents must be scanned in RGB mode to capture all the image data, and to maximize OCR accuracy.
  • Low-contrast documents can result in poor OCR.
  • Typescript results in poorer OCR than printed type; inconsistent use of font faces and sizes can lower OCR accuracy.
  • Font sizes of below 6 points in the original can limit OCR, although increasing resolution in the scanned image to 600 dpi and using greyscale may improve OCR output.
  • Handwritten documents cannot be recognized with any degree of accuracy.

Scanning considerations that affect the accuracy of OCR include:

  • The recommended best scanning resolution for OCR accuracy is 300 dpi. Higher resolutions do not necessarily result in better accuracy and can slow down OCR processing time. Resolutions below 300 dpi may affect the quality and accuracy of OCR results.
  • Brightness settings that are too high or too low may adversely affect OCR accuracy. A medium brightness value of 50% will be suitable in most cases.
  • Straightness of the initial scan can affect OCR quality; crooked lines of text produce poor results.
  • Older and discolored documents must be scanned in RGB mode to capture all the image data, and to maximize OCR accuracy.
  • Image enhancements, such as contrast adjustment and unsharp mask, have NOT been shown to significantly enhance the accuracy of OCR.
Community
  • 1
  • 1
b_levitt
  • 7,059
  • 2
  • 41
  • 56
0

ABBYY's component is pretty expensive. I've evaluated Pegasus ImagXpress and Atalasoft DotImage and while I've found DotImage more accurate on full-page OCR, certain small portions of text which was difficult to recognize were better read by ImagXpress. I suggest you try demo versions of both and see what fits best for your needs.

em70
  • 6,088
  • 6
  • 48
  • 80