6

Error was reported when running tesseract on a image (image attached)enter image description here

tesseract rsa-out.jpg stdout

Warning. Invalid resolution 0 dpi. Using 70 instead.
Empty page!!
Empty page!!
idiot one
  • 314
  • 1
  • 4
  • 11

5 Answers5

6

The image's metadata probably does not include image resolution. You can use --dpi command option to specify DPI for input image, if you know it. Run tesseract --help-extra to get more info.

Updated with version info and output from cmd:

>tesseract -v
tesseract 4.1.1
 leptonica-1.79.0 (Jan  2 2020, 22:29:02) [MSC v.1924 DLL Release x64]
  libgif 5.1.4 : libjpeg 9c : libpng 1.6.37 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.3 : libopenjp2 2.3.1

>tesseract --help-extra
Usage:
  tesseract --help | --help-extra | --help-psm | --help-oem | --version
  tesseract --list-langs [--tessdata-dir PATH]
  tesseract --print-parameters [options...] [configfile...]
  tesseract imagename|imagelist|stdin outputbase|stdout [options...] [configfile...]

OCR options:
  --tessdata-dir PATH   Specify the location of tessdata path.
  --user-words PATH     Specify the location of user words file.
  --user-patterns PATH  Specify the location of user patterns file.
  --dpi VALUE           Specify DPI for input image.
  -l LANG[+LANG]        Specify language(s) used for OCR.
  -c VAR=VALUE          Set value for config variables.
                        Multiple -c arguments are allowed.
  --psm NUM             Specify page segmentation mode.
  --oem NUM             Specify OCR Engine mode.
NOTE: These options must occur before any configfile.

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR. (not implemented)
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
       bypassing hacks that are Tesseract-specific.

OCR Engine modes:
  0    Legacy engine only.
  1    Neural nets LSTM engine only.
  2    Legacy + LSTM engines.
  3    Default, based on what is available.

Single options:
  -h, --help            Show minimal help message.
  --help-extra          Show extra help for advanced users.
  --help-psm            Show page segmentation modes.
  --help-oem            Show OCR Engine modes.
  -v, --version         Show version information.
  --list-langs          List available languages for tesseract engine.
  --print-parameters    Print tesseract parameters.
nguyenq
  • 8,212
  • 1
  • 16
  • 16
2
public class Test {
    public static void main(String[] args) {
        File imageFile = new File("C:\\Users\\data.jpg");
        ITesseract instance = new Tesseract(); 
        
        instance.setTessVariable("user_defined_dpi", "96");
         System.err.println(instance.getClass().getName().toString());
        try {
            String result = instance.doOCR(imageFile);
            System.out.println(result);
        } catch (TesseractException e) {
            System.err.println(e.getMessage());
        }
    }
}

Add this line: instance.setTessVariable("user_defined_dpi", "96");

VIGNESH S
  • 21
  • 2
  • 1
    Please provide additional details in your answer. As it's currently written, it's hard to understand your solution. – Community Sep 08 '21 at 08:56
  • Thank you, I was using SetVariable but it didn't recognize "dpi", changing to "user_defined_dpi" worked to get rid of the warning. I used 70 as the warning was saying that it assumed that. – Nand Oct 11 '22 at 18:55
1

The warning tells you that the input image does not contain resolution info in its metadata, so Tesseract warns you about the same and then tries to estimate the resolution by itself. You can refer this issue for more information.

Yogita Bhatia
  • 541
  • 4
  • 7
1

You should be able to load it normally using the following lines:

import cv2
import pytesseract

image = cv2.imread('FS313.jpg')
text = pytesseract.image_to_string(image,lang='eng',config='--psm 3') 

However, you won't be able to get accurate OCR results regardless of the psm because Tesseract is not trained for such digits. There is a video that may help you to train it on your own if you wish called "Tesseract OCR - Create Trained data for Seven segment (Sample)" on Youtube.

Esraa Abdelmaksoud
  • 1,307
  • 12
  • 25
  • While testing locally I got the next error: *** FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tess_22uin80z.osd'. Code: image_original = Image.open(image_file); rgb = image_original.convert("RGB"); pytesseract.image_to_osd(rgb, output_type=Output.DICT, config='--psm 3'); – eduardosufan Jul 04 '23 at 18:08
  • @eduardosufan use image_to_string instead. – Esraa Abdelmaksoud Jul 04 '23 at 20:59
0

To test if an image has the correct header you can use magick identify -verbose filename or equivalent tools

and make sure these 2 values are set Resolution: 118.11x118.11 Units: PixelsPerCentimeter Above is for a 300 dpi PNG

MD SHAYON
  • 7,001
  • 45
  • 38