1

We need a high volume scanning and ocr solution

we are talkin about digitalizing about 4000 documents a day, and saving them as pdf file with ocr (with hidden text)...

the solution should let the operators scan a document and automatically save the files to a specific network resource, to be taken by an app that uploads it to a DB...

we are evaluating an enterprise solution from kofax http://www.kofax.com/

what other products are you aware of?

any experience with similar requirements?

any open source (or at least accesible) solution?

com, activex api support?

opensas
  • 60,462
  • 79
  • 252
  • 386

5 Answers5

5

There are many vendors of scanning products that can do what you want - scan, index, generate PDF with OCR overlay (personally, I prefer OCR underlay in a PDF). Those requirements are pretty trivial for a vendor that specializes in scanning. To name just a few other vendors/products in addition to Kofax:

  • EMC/Captiva's InputAccel product
  • Datacap
  • eCopy ShareScan
  • Verity/Cardiff/Autonomy

Many document management solutions also have built-in scanning front ends but they're typically not as functional as the specialized capture products. Nearly all of these solutions have COM/ActiveX API support. I don't know of any open source solutions for scanning but I haven't ever really searched for any either.

Most of the scanning software vendors do use a "volume" or "capacity" license. Typically the volume renews at the end of the term (i.e. 1M pages a year - auto renewing each year without additional cost). Thus, you don't pay strictly "per page" in the sense that if you purchase a capacity of 1M images per year and you only end up scanning 500K pages you don't get a refund. It is possible, although much less common to have a one-time volume that doesn't automatically renew and when it runs out you would be required to purchase additional volume. Most vendors are moving away from dongles to control the volume and are moving to software licensing.

A side note about Kofax:

Kofax has historically been sold through a system of Value Added Resellers so the quality of various implementations can vary widely. In addition it is highly customizable and comes in a variety of flavors with lots of add-on modules so one customer's Kofax system can be significantly different from other systems.

Kofax is used in enterprise-grade systems for scanning and automatic capture of millions and millions of documents a year. It has a significant chunk of the document scanning market share. No, I'm not a Kofax fanboy, if I was I wouldn't have mentioned competitive products; however, I am very familiar with it. Like the other products on the market, it has strengths and weaknesses. I realize that Michael was just relaying what he had heard but I just couldn't let that sweeping generalization pass without comment. Saying a product that has a significant percentage of market share is "not useful or user friendly" for scanning is kind of like saying "Windows isn't a useful server operating system". It's just too broad of a generalization.

Cheers,

Brian

Brian
  • 361
  • 2
  • 7
0

You can try ChronoScan, it has free OCR through tesseract, and has Forms Recognition Options, and it's free for non-commercial use.

The software is in and advanced development stage, and you have a forum to talk directly with the developers.

http://www.chronoscan.org Short video reading forms

Jose
  • 41
  • 2
0

PSIGEN makes a great alternative to Kofax, is packed with features and reasonably priced.

Kofax Alternative Scanning and Capture Application

0

Kofax is not very useful or user-friendly (per my counterparts working with the County). It's adequate, but not good.

We use an all Adobe solution. Details to follow (I'm not in charge of running that area, so I have to gather some information for you).

Update: We use

Adobe Acrobat Capture 3.0
Two RICOH Color Scanner IS760D with ADF
Acrobat Standard or Professional (depending upon the user)

We have an extensive library (almost 6,000 documents) with hundreds of thousands of scanned pages available. The computer doing the scanning has a dongle on it that we purchase (250,000 scans until we need to purchase an 'update'); I don't have the cost available since the gentleman that handles that has gone home for the day, but I remember it being in the micro-cents per page.

We often scan documents with several hundred pages that need to be done that day and we have no problem completing that task.

A link to some of our efforts (a web front-end, or sorts, to our library) is available at http://acequia.ccrfcd.org/FileLibrary2/FileLibrary.aspx if you'd like to get an idea of what we've done.

As for putting these PDFs into a database, it'd be pretty easy to create an application (perhaps a service) to monitor a directory and grab each PDF that pops up there after Capture runs, copy the information to the database, then either delete it or move it to its new home.

Michael Todd
  • 16,679
  • 4
  • 49
  • 69
  • thanks a lot for the info, michael, if you can provide tell me the cost per pega it will be wonderful... by the way, do every provider implements a per page policy??? I think kofax offers a similar solution, a dongle and you pay for each scanned page... – opensas May 23 '09 at 14:01
0

How well do you want your OCR to be? Do you need all content to be human readable or do you just needs some content to be able to classify document (customer nr; type of document; barcodes ...).

http://www.irislink.com is a company that develops solutions for scanning and classifying documents.
Their software is included in several brands of multifunctionals and consumer scanners. The corporate is more aimed towards extracting info and using it (f.e. automatic input of invoices into accounting software).
My experience is that it handles the OCR'ed text better (correcting words etc.) than Kofax (we use both); though Kofax can be expanded more as to reach a better level (this means more setup work and more maintenance).

Both softwares are really usefull for how they treat documents.
If your only wish is to scan the documents; convert to pdf and save it on a network share; you may have enough buying a good scanner and using the included software.
You may also wish to check out the tesseract project; it's an open source ocr engine with good results.

Brtrnd
  • 196
  • 15