How to use trained data with pytesseract?

Question

Using this tool http://trainyourtesseract.com/ I would like to be able to use new fonts with pytesseract. the tool give me a file called *.traineddata

Right now I'm using this simple script :

try:
    import Image
except ImportError:
    from PIL import Image
import pytesseract as tes

results = tes.image_to_string(Image.open('./test.jpg'),boxes=True)
file = open('parsing.text','a')
file.write(results)
print(results)

How to I use my traineddata file so I'm able to read new font with the python script ?

thanks !

edit#1 : so I understand that *.traineddata can be used with Tesseract as a command-line program. so my question still the same, how do I use traineddata with python ?

edit#2 : the answer to my question is here How to access the command line for Tesseract from Python?

score 7 · Accepted Answer · edited May 19 '19 at 18:10

7

Below is a sample of pytesseract.image_to_string() with options.

pytesseract.image_to_string(Image.open("./imagesStackoverflow/xyz-small-gray.png"),
                                  lang="eng",boxes=False,
                                  config="--psm 4 --oem 3 
                                  -c tessedit_char_whitelist=-01234567890XYZ:"))

To use your own trained language data, just replace "eng" in lang="eng" with you language name(.traineddata).

edited May 19 '19 at 18:10

evandrix

6,041
4
27
38

answered May 26 '17 at 14:10

thewaywewere

8,128
11
41
46

1

A small addition to the above answer: Keep xyz.traineddata file in the path where tesseract data is kept (example: /usr/share/tesseract-ocr/tessdata/) and pass following : `pytesseract.image_to_string(Image.open("./imagesStackoverflow/xyz-small-gray.png"),lang="xyz")` – Milind Deore Oct 09 '18 at 01:39
`.traineddata` is appended to the lang name and whitelist is broken in OpenCV 4. – Cees Timmerman Jun 07 '19 at 08:43

How to use trained data with pytesseract?

1 Answers1