Tensorflow model for OCR arabic

Question

I am a beginner in Tensorflow and I want to build an OCR model with Tensorflow that detects Arabic words from cursive Arabic fonts (i.e. joint Arabic handwriting). Ideally, the model would be able to detect both Arabic and English. Please see the attached image of a page in a dictionary that I am currently trying to OCR. The other pages in the book have the same font and layout with both English and Arabic.

I have two questions:

(1) Would I be training with individual characters in the joint/cursive Arabic text or would I need bounding boxes for the entire words or individual characters?

(2) Are there any other OCR Tensorflow (or Keras) models available that deal with cursive writing particularly with Arabic.

score 3 · Answer 1 · answered Feb 18 '18 at 01:27

3

Tesseract, an OCR engine from Google, has an Arabic trained model.

Learn more about it here: https://github.com/tesseract-ocr/tesseract

Languages it supports are here: https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#languages

The Arabic dataset is here: https://github.com/tesseract-ocr/tessdata/blob/master/ara.traineddata

Hope this helps!

answered Feb 18 '18 at 01:27

Josh Payne

373
1
10

how can i convert .traineddata to tflite extension? thanks in advance – user10033434 Apr 24 '21 at 09:15

score 1 · Answer 2 · answered Jan 20 '18 at 16:28

I don't think so you can use the whole page as the input image, maybe word by word is a better choice as a primitive solution, let's look at these links:

https://hackernoon.com/latest-deep-learning-ocr-with-keras-and-supervisely-in-15-minutes-34aecd630ed8

http://ai.stanford.edu/~ang/papers/ICPR12-TextRecognitionConvNeuralNets.pdf

How to create dataset in the same format as the FSNS dataset？

Tensorflow model for OCR arabic

2 Answers2