1

I am using a web service to read image files and return me some text on it using Tesseract.

As we know Tesseract/Tess4j supports languages input in the ISO 639-3 format (ie: eng, spa, deu, ara, etc...) but the language I get from the mobile device comes in this format: en-gb, pt-br,...

My user can be using any language and request a picture reading.

My question is: anybody has any ideia how to solve this?

And more, if I don't set any language, does it guess/find the language on the image?

Francisco Souza
  • 806
  • 15
  • 38

1 Answers1

5

My question is: anybody has any ideia how to solve this?

convert / find the correct language using something like this

    for (Locale locale : Locale.getAvailableLocales()) {
        System.out.println("" + locale
                + "; display: " + locale.getDisplayLanguage()
                + "; name: " + locale.getDisplayName()
                + "; lang: " + locale.getLanguage()
                + "; iso3: " + locale.getISO3Language());
    }

Then you can set it in tess4j.

Ref for the above

And more, if I don't set any language, does it guess/find the language on the image?

I believe a default value should be set otherwise it will throw an error. (Have not gone through the source code)

Tinus Jackson
  • 3,397
  • 2
  • 25
  • 58