Is there a library (or executable) that can OCR a PDF (typically a PDF created by scanning a paper), and inject the recognized text back into the PDF? Probably as invisible text behind the scanned images.
Preferably open source.
(Goal: I have a huge library of PDF files indexed by Lucene. It would be much easier for Lucene to find what PDFs are relevant if the PDFs contained text.)