How to recognize deformed text under some other bigger object by using pytesseract and opencv-python in python?

Question

I am using pytesseract to recognize text as follow

td = pytesseract.image_to_data(img, output_type=Output.DICT)
tn_boxes = len(td['level'])
for o in range(0, tn_boxes):
    text = td['text'][o]
    print(text)

i am just making an index of Examples by using a simple logic detect keyword 'Example no.' find it's end point keyword 'Sol.' and put a piece of image from keyword 'Example no.' to keyword 'Sol.' into index and then find next example and so on
But when i try following image Then it show output SET THEORY ae . . 5 (6) Let A = {x: x isa negative odd integer} = {-1,-3,-5,-7,...etc
See how it is not recognizing first line Sol. (a) Let A={x:x is a natural number..etc.
And when i try it with following image not having horizontal line it just works fine.

Is there any way to configure pytesseract to recognize text with having a line above it ?

Edited:

sometimes when we place some image above text or some other text with higher size then pytesseract fails to detect text below that bigger object.

Is there any solution for this kind of problem may be there is a way to configure detection minimum size or configure to detect all possible sized text even under some bigger objects ?

For example it show output usually denoted by o(G). ors a a {= 7 Wave =e () oe that the set of ae | group usual ition of integers.
See how it is not detecting keyword Example 1. for folowing image

But when i try following image it shows output usually denoted by o(G). Example 1. (2) Prove that th . group under usual addition of integers, Now it is detecting keyword Example 1.

what about removing automatically the black line ? you can easily detect it based and its size (almost the whole width) and position (just above the Sol. text) You can even use it to undistort the text, but that's another topic ;-) — antoine, Jun 09 '20 at 12:22
Thanku for a solution i will try this. But sometimes when we place some image above text or some other text with higher size then pytesseract fails to detect text below that bigger object. Can you suggest any solution for this kind of problem may be there is a way to configure detection minimum size or configure to detect all possible sized text even under some bigger objects — Navpreet Devpuri, Jun 09 '20 at 15:42
i submitted a issue https://github.com/tesseract-ocr/tesseract/issues/3011 — Navpreet Devpuri, Jun 09 '20 at 17:07

score 1 · Accepted Answer · answered Jun 09 '20 at 17:14

1

Read e.g. image processing to improve tesseract OCR accuracy and read the docs.

answered Jun 09 '20 at 17:14

user898678

2,994
2
18
17

I found a better dewrapper is [ocrd_cis](https://github.com/cisocrgroup/ocrd_cis) but for now i don't know how to use it And When we scale up given image to a scale actor 3 then it detects keyword `Example 1.` But now question is how to find that scale factor to get best results i asked that question [here](https://stackoverflow.com/questions/62480172/tesseract-ocr-act-weird-while-scalling-up-image-size-how-to-know-which-scale-fa) – Navpreet Devpuri Jun 20 '20 at 00:46
i want best results what should i try ? is there any way to configure minimun and maximum font size ? – Navpreet Devpuri Jun 20 '20 at 00:48

score 1 · Answer 2 · answered Jun 09 '20 at 19:00

1

You can try dewarping the image. I used this repo dewarp-github
The code is written in version 2 of python. If you are using version 3+ of python, you can convert this code into version 3 using 2to3. It needed some modifications for my case which were not too complex to handle.

answered Jun 09 '20 at 19:00

Beginner

61
11

There's a python 3 fork: https://github.com/bertsky/page_dewarp/tree/support-python3 – hellpanderr Aug 13 '21 at 09:54

How to recognize deformed text under some other bigger object by using pytesseract and opencv-python in python?

Is there any way to configure pytesseract to recognize text with having a line above it ?

Edited:

Is there any solution for this kind of problem may be there is a way to configure detection minimum size or configure to detect all possible sized text even under some bigger objects ?

2 Answers2