I am working on AWS - Lambda (Python).
I am working on an already existant code that uses tesseract
package.
There is a function in my main that calls this:
def lambda_ocr(ze_path, step):
if step == 1:
ocr_options = "--oem 1 -l eng --psm 6"
elif step == 2:
ocr_options = "--oem 0 -l eng --psm 6"
elif step == 3:
ocr_options = "--oem 1 -l fra --psm 3"
elif step == 4 :
ocr_options = "--oem 0 -l fra --psm 11"
else:
print("WARNING invalid step given for ocr. default option --oem 1 -l fra --psm 3.")
ocr_options = "--oem 1 -l fra --psm 3"
res = ocr(ze_path, config=ocr_options)
def ocr(img_path, config="--oem 1 -l fra --psm 3"):
""" This function is called by get_text_OCR_Parallel
we can modify the tesseract config here
"""
raw_text = pytesseract.image_to_string(img_path, config=config)
return raw_text
def image_to_string(image,
lang=None,
config='',
nice=0,
output_type=Output.STRING):
'''
Returns the result of a Tesseract OCR run on the provided image to string
'''
args = [image, 'txt', lang, config, nice]
return {
Output.BYTES: lambda: run_and_get_output(*(args + [True])),
Output.DICT: lambda: {'text': run_and_get_output(*args)},
Output.STRING: lambda: run_and_get_output(*args),
}[output_type]()
When I call lambda_ocr
function with step=1, everything works fine. But when step=2, 3 or 4 it throws the error.
I Don't know much on tesseract
package but according to this, I should install missing packages.
What I Don't understand is how is it working when step=1 if the package is not well installed? Shouldn't it throw an error too?
Any help is appreciated. Thank you