How make keras-ocr default model recognize only numbers?

Question

I use python and keras ocr. I want keras to recognize only numbers, so in pipeline i do this.

recognizer = keras_ocr.recognition.Recognizer(alphabet="0123456789")
pipeline = keras_ocr.pipeline.Pipeline(recognizer=recognizer)

But instead of turning letters to digits and improving quality of recognition like tesseract whitelist it happens. So the numbers are not recognized at all.

With default alphabet the result is better. But some digits are confused with letters. However change letters to digits like "replace("O", "0")" is quite a bad idea.

Function for recognizing is simple and copied :)


    _image = keras_ocr.tools.read(_path)
    plt.figure(figsize=(10, 20))
    plt.imshow(_image)

    prediction = pipeline.recognize([_image])[0]
    fig, axs = plt.subplots(1, figsize=(10, 20))
    keras_ocr.tools.drawAnnotations(image=_image, predictions=prediction, ax=axs)
    plt.show()

score 1 · Answer 1 · answered Jun 18 '22 at 12:37

I haven't found more simple way, than learn model using keras ocr tools. However text generator for synthetic data uses texts from books, journals or smth that has an idea, meaning (i don't know to say it in english:)). So there are few numbers and sometimes if your alphabet is "0123456789", generator returns empty string. So I've written my own generator, that makes string only with digits. https://keras-ocr.readthedocs.io/en/latest/examples/end_to_end_training.html https://colab.research.google.com/drive/1PxxXyH3XaBoTgxKIoC9dKIRo4wUo-QDg#scrollTo=I7SF5VeoLulc

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). — Community, Jun 19 '22 at 05:00

How make keras-ocr default model recognize only numbers?

1 Answers1