4

I have trained my own model for Urdu language using jtessboxeditor to create tiff/box file and then used Serak tesseract trainer for creating trainedata file, Model is recognizing urdu language but there are 2 issues mainly other than accuracy(accuracy will be tested after solving following 2 issues).

  1. model is not recognizing the spaces b/w the words.
  2. model is showing the text in LTR form (Urdu is RTL language, similar to arabic) I know that domain have very specific group of peoples but I just want a Hint to right direction so any help will be greatly appreciated. thanks in advance.
Abdul Hameed
  • 1,008
  • 1
  • 15
  • 35
  • it's a solved problem use http://cle.org.pk/clestore/urduocr.htm U r limited to Nastalique font. – Muhammad Adeel Aug 13 '18 at 14:55
  • i suggest u go towards this approach to promote Urdu: Typing tutor: t.urdu.cafe Book publishing: urdupub.com I have already developed this softwares for the common good of . Using them u will not need OCR as you are already one step ahead, you are reading / writing / creating content in Urdu. – Muhammad Adeel Aug 13 '18 at 14:57
  • Hello @MuhammadAdeel thank you for commenting, as i mentioned before i am developing my own model i dont need any software i want the solution of this problem. – Muhammad Moinuddin Aug 15 '18 at 09:24
  • @MuhammadAdeel are you the developer of cle.org.pk/clestore/urduocr.htm? – Muhammad Moinuddin Sep 03 '18 at 09:59
  • No, this was developed at KICS. But I believe now this kind of tech. is becoming useless. Most importantly I have been told by a startup working on Books that this OCR doesn't really work in daily/practical usage. Converting some text while leaving the rest out. – Muhammad Adeel Sep 04 '18 at 10:56
  • yes exactly, this software is not much reliable, it is also unable to perform OCR on tabular Urdu data. – Muhammad Moinuddin Sep 04 '18 at 12:54
  • 1
    Exactly, think of the bigger picture. Where is Urdu going to be in 10 years, trying to catch up with the last 200 years or aiming for the future. In the future there is no need for tools like these. We just need someone to data enter these in one-go if necessary, otherwise just compose books using a proper online Urdu Book publishing platform like urdupub.com – Muhammad Adeel Sep 06 '18 at 19:34
  • I have not check this but the best free and paid OCR Reader is Google Vision API OCR. Thanks – MindRoasterMir Feb 04 '19 at 14:48

0 Answers0