Lambda function returning Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract

Question

I am working on AWS - Lambda (Python).

I am working on an already existant code that uses tesseract package. There is a function in my main that calls this:

 def lambda_ocr(ze_path, step):

    if step == 1:
        ocr_options = "--oem 1 -l eng --psm 6"
    elif step == 2:
        ocr_options = "--oem 0 -l eng --psm 6"
    elif step == 3:
        ocr_options = "--oem 1 -l fra --psm 3"
    elif step == 4 :
        ocr_options = "--oem 0 -l fra --psm 11"
    else:
        print("WARNING invalid step given for ocr. default option --oem 1 -l fra --psm 3.")
        ocr_options = "--oem 1 -l fra --psm 3"
    res = ocr(ze_path, config=ocr_options)


def ocr(img_path, config="--oem 1 -l fra --psm 3"):
    """ This function is called by get_text_OCR_Parallel
        we can modify the tesseract config here
    """
    raw_text = pytesseract.image_to_string(img_path, config=config)

    return raw_text

def image_to_string(image,
                    lang=None,
                    config='',
                    nice=0,
                    output_type=Output.STRING):
    '''
    Returns the result of a Tesseract OCR run on the provided image to string
    '''
    args = [image, 'txt', lang, config, nice]

    return {
        Output.BYTES: lambda: run_and_get_output(*(args + [True])),
        Output.DICT: lambda: {'text': run_and_get_output(*args)},
        Output.STRING: lambda: run_and_get_output(*args),
    }[output_type]()

When I call lambda_ocr function with step=1, everything works fine. But when step=2, 3 or 4 it throws the error.

I Don't know much on tesseract package but according to this, I should install missing packages.

What I Don't understand is how is it working when step=1 if the package is not well installed? Shouldn't it throw an error too?

Any help is appreciated. Thank you

score 0 · Accepted Answer · answered Mar 25 '20 at 13:15

0

The solution was to use a Lambda layer to install the missing package. I downloaded required files from git and then uploaded the .zip on AWS and created a layer with it.

answered Mar 25 '20 at 13:15

Haha

973
16
43

Lambda function returning Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract

1 Answers1