3

I have several images from which I want to extract text. I tried pytesseract but it did not work. I faced the issue as described here and here. The images I have, use the following fonts:

  • MultiTypePixel NarrowBold
  • Cave-Story-Regular

Here are the sample images I want to extract the text from.

enter image description here

enter image description here

enter image description here

Keras-OCR does not directly support the fonts I mentioned above as these fonts are used by Keras-OCR and do not mention any of the fonts given above.

Here is the Keras-OCR code I got from their website.

import matplotlib.pyplot as plt

import keras_ocr

# keras-ocr will automatically download pretrained
# weights for the detector and recognizer.
pipeline = keras_ocr.pipeline.Pipeline()

# Get a set of three example images
images = [
    keras_ocr.tools.read(url) for url in [
        'https://upload.wikimedia.org/wikipedia/commons/b/bd/Army_Reserves_Recruitment_Banner_MOD_45156284.jpg',
        'https://upload.wikimedia.org/wikipedia/commons/e/e8/FseeG2QeLXo.jpg',
        'https://upload.wikimedia.org/wikipedia/commons/b/b4/EUBanana-500x112.jpg'
    ]
]

# Each list of predictions in prediction_groups is a list of
# (word, box) tuples.
prediction_groups = pipeline.recognize(images)

# Plot the predictions
fig, axs = plt.subplots(nrows=len(images), figsize=(20, 20))
for ax, image, predictions in zip(axs, images, prediction_groups):
    keras_ocr.tools.drawAnnotations(image=image, predictions=predictions, ax=ax) 

If I use the code given here in Keras-OCR documentation, it downloads the fonts directly from the google-fonts repo. I am not getting a way to put my fonts for and train the model on them. Is there any way I could use my custom fonts to train the model and extract the data from these images? The default Keras-OCR model identifies the text wrong.

If there is any other method to perform this task, I am open to that as well.

JAMSHAID
  • 1,258
  • 9
  • 32

1 Answers1

0
fonts = keras_ocr.data_generation.get_fonts(
    alphabet=alphabet,
    cache_dir=data_dir)

In the tutorial (here), this gives a list of font paths. Replace it with your own custom font paths.