Tess4j - Pdf to Tiff to tesseract - "Warning: Invalid resolution 0 dpi. Using 70 instead."

Question

I am usig tess4j (net.sourceforge.tess4j:tess4j:4.4.0) and try OCR on pdf files. So as I understood I have to transform the pdf first to tiff or png (any of those suggested?) what I did like this:

tesseract.doOCR(PdfUtilities.convertPdf2Tiff(inputPdfFile));

and get following warning:

Warning: Invalid resolution 0 dpi. Using 70 instead.

Question

Does it has any influence on my scan results? (if not, ok - I can switch off the warning)
Is there a way to set the DPI by hand or should convertPdf handle this for me?

score 7 · Accepted Answer · edited Nov 02 '22 at 08:08

If no resolution information is in image metadata, Tesseract tries to estimate the resolution by itself so that font size information can be calculated in results.

You can try the following APIs to set input image resolution:

instance.setVariable("user_defined_dpi", "300");

or

TessBaseAPISetSourceResolution(TessBaseAPI handle, int ppi);

You can suppress console output by:

instance.setVariable("debug_file", "/dev/null");

score 0 · Answer 2 · edited Nov 02 '22 at 08:08

0

The default resolution is not set.

To complement nguyenq `s answer :

instance.setVariable("user_defined_dpi", "300");

edited Nov 02 '22 at 08:08

larsw

3,790
2
25
37

answered Nov 18 '20 at 06:48

Vlad-Florin Ciocan

1
1
1

score 0 · Answer 3 · answered Aug 31 '22 at 22:19

0

In version 5.4.0 of tess4j,

instance.setVariable("user_defined_dpi", "300");

instead of

instance.SetTessVariable("user_defined_dpi", "300");

answered Aug 31 '22 at 22:19

David James

11
2

Tess4j - Pdf to Tiff to tesseract - "Warning: Invalid resolution 0 dpi. Using 70 instead."

3 Answers3

Linked