Tesseract - [Japanese] Vertical text with horizontal numbers

Asked Mar 02 '19 at 01:36

Active Mar 02 '19 at 02:03

Viewed 1,297 times

I'm having trouble with vertical text mixed with horizontal numbers.

For example:

Text

If that was a single digit it would've been successful but tesseract tries to read this number as a single character since it expects characters to come vertically. I know tesseract gives a confidence factor for the whole sentence and not for every character. Is there a way to identify low confidence on this character only and try something different on it to correctly parse the numbers?

edited Mar 02 '19 at 02:03

Flux

9,805
5
46
92

asked Mar 02 '19 at 01:36

K41F4r

1,443
1
16
36

Are you using jpn_vert.traineddata? – user3169 Mar 02 '19 at 07:08
Yes, I should have mentioned that, it reads the "24" part as one character – K41F4r Mar 02 '19 at 10:22
I don't know about the confidence part, but you might address that as a separate question. Like "high confidence" do A, "low confidence" do B, rather than focus on a specific example. – user3169 Mar 04 '19 at 03:30
You might also look into page segmentation, to separate the numbers from the kanji. Something like in [How do I segment a document using Tesseract then output the resulting bounding boxes and labels](https://stackoverflow.com/questions/28591117/how-do-i-segment-a-document-using-tesseract-then-output-the-resulting-bounding-b), though it's far beyond anything I've done. – user3169 Mar 04 '19 at 03:37

Tesseract - [Japanese] Vertical text with horizontal numbers

0 Answers0