1

enter image description here

I am having a hard time working with Tesseract, is there a way to improve the accuracy? How do I train it for myself, if needed?

the only thing I am doing is reading the following characters, XYZ:-0123456789 that's it! The pictures always look that way.

thanks!

Michael C
  • 135
  • 1
  • 1
  • 12
  • Tesseract is already working as well as can. Use higher-resolution images. https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract is a good starting point for training tesseract. – sashoalm Mar 24 '17 at 09:41
  • You may use PIL or OpenCV to perform preprocessing before send the image to Tesseract. Try to improve resolution, then dilute the image to disconnect the '-' from any digit. – thewaywewere Mar 29 '17 at 21:25

1 Answers1

4

The output of Tesseract 4.00alpha with your image is

$ tesseract ICKcj.png - -l eng
*: 4606 Y; 4809 Z; 698

Warning. Invalid resolution 0 dpi. Using 70 instead.

Resample the picture to 50% and setting the dpi to 300:

enter image description here

The output with this image is slightly better and the warning is vanishing:

$ tesseract ICKcj-50.png - -l eng
X: 4606 Y: 4809 Z: 698

The only thing missing are the minus signs, which are printed quite irregular (a better resolution in the picture could help). It is also possible to restrict the output pattern in tesseract. Alternatively, you can try to guess the minus afterwards depending on the spaces between the X, Y, Z and the numbers.

zuphilip
  • 520
  • 3
  • 12
  • how do i change the DPI? – Michael C Mar 24 '17 at 18:48
  • I did this for the one image with IrfanView. I guess that other graphics software can do the same. There is even an online service to do this: https://convert.town/image-dpi . To do such changes in a batch, I would suggest to look at the options of ImageMagick's `convert`. – zuphilip Mar 25 '17 at 11:05