I want to improve text recognition accuracy in somewhat complex image.
I'm currently using following sample code:
https://github.com/opencv/opencv_contrib/blob/master/modules/text/samples/textdetection.cpp
I'm trying to detect text from this image:
https://i.ytimg.com/vi/WFobUoRn6Ek/maxresdefault.jpg
(Note: It is similar kinds of footage I'm trying to handle)
Result:
There're many issues, but the biggest problem to me is small letters can be easily omitted or mis-recognized.
e.g.) 'i' became l
e.g.) 'in' became 'm'
e.g.) 'l' can be gone...
I think the problem is related to the resulted image from erFilter. As this pic shows some small parts are already omitted at this point.
Please let me know if there's a good way to avoid such small parts omission.
Possibly some sort of preprocessing to the image?
Note: I already checked following post, but the my target footage is similar to example of 'failure' case in his paper.