4

I use python and keras ocr. I want keras to recognize only numbers, so in pipeline i do this.

recognizer = keras_ocr.recognition.Recognizer(alphabet="0123456789")
pipeline = keras_ocr.pipeline.Pipeline(recognizer=recognizer)

But instead of turning letters to digits and improving quality of recognition like tesseract whitelist it happens. wrong recognized number So the numbers are not recognized at all.

number recognized with default alphabet With default alphabet the result is better. But some digits are confused with letters. However change letters to digits like "replace("O", "0")" is quite a bad idea.

Function for recognizing is simple and copied :)


    _image = keras_ocr.tools.read(_path)
    plt.figure(figsize=(10, 20))
    plt.imshow(_image)

    prediction = pipeline.recognize([_image])[0]
    fig, axs = plt.subplots(1, figsize=(10, 20))
    keras_ocr.tools.drawAnnotations(image=_image, predictions=prediction, ax=axs)
    plt.show()

1 Answers1

1

I haven't found more simple way, than learn model using keras ocr tools. However text generator for synthetic data uses texts from books, journals or smth that has an idea, meaning (i don't know to say it in english:)). So there are few numbers and sometimes if your alphabet is "0123456789", generator returns empty string. So I've written my own generator, that makes string only with digits. https://keras-ocr.readthedocs.io/en/latest/examples/end_to_end_training.html https://colab.research.google.com/drive/1PxxXyH3XaBoTgxKIoC9dKIRo4wUo-QDg#scrollTo=I7SF5VeoLulc

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 19 '22 at 05:00