I am usig tess4j (net.sourceforge.tess4j:tess4j:4.4.0) and try OCR on pdf files. So as I understood I have to transform the pdf first to tiff or png (any of those suggested?) what I did like this:
tesseract.doOCR(PdfUtilities.convertPdf2Tiff(inputPdfFile));
and get following warning:
Warning: Invalid resolution 0 dpi. Using 70 instead.
Question
- Does it has any influence on my scan results? (if not, ok - I can switch off the warning)
- Is there a way to set the DPI by hand or should
convertPdf
handle this for me?