2

I am using Tesseract for OCR purposes and I have added few additional words into "fin.user-words" (I would like to avoid creating a new word list and replacing tessdata/fin.word-dawg with it). Now, I succeeded doing it in command prompt:

>tesseract image.png result -l fin TestConfig

where TestConfig (Tesseract configuration file located under .../tessdata/configs) supresses the system dictionaries and forces Tesseract to load my words:

load_system_dawg F
load_freq_dawg F
user_words_suffix user-words

ref: http://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data

I am trying to replicate the above procedure of command line, in Java but it seems that Tesseract ignores the configuration options. Here is the part of the Java code I am using:

public static TestTesseract(BufferedImage image) {
        Tesseract instance = Tesseract.getInstance();
        instance.setLanguage("fin");
        instance.setTessVariable("load_system_dawg", "F");
        instance.setTessVariable("load_freq_dawg", "F");
        instance.setTessVariable("user_words_suffix", "user-words");
        try {
            String result = instance.doOCR(image);
            System.out.println(result);         
        } catch (TesseractException e) {
            System.err.println(e.getMessage());
        }
}

Below is the nearest question to mine I could find; however, I could not find setConfigs method:

instance.setConfigs(Arrays.asList("bazaar");

Forcing Tesseract to match pattern (four digits in a row)

Community
  • 1
  • 1
ABData
  • 23
  • 1
  • 5

1 Answers1

0

The setConfig method is new since Tess4J v1.4 (see doc).

instance.setConfigs(Arrays.asList("TestConfig");
nguyenq
  • 8,212
  • 1
  • 16
  • 16
  • Thanks @nguyenq. After updating tp 1.4, I am not able to use Tesseract anymore: `Exception in thread "Run$_main" java.lang.NoClassDefFoundError: org/apache/commons/io/FileUtils at net.sourceforge.tess4j.util.LoadLibs.copyJarResourceToDirectory(Unknown Source) at net.sourceforge.tess4j.util.LoadLibs.extractTessResources(Unknown Source) at net.sourceforge.tess4j.util.LoadLibs.(Unknown Source) at net.sourceforge.tess4j.TessAPI.(Unknown Source) at net.sourceforge.tess4j.Tesseract.init(Unknown Source) at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source)` – ABData Jan 28 '15 at 14:07
  • `at net.sourceforge.tess4j.Tesseract.doOCR(Unknown Source) at TestTesseract.TestTesseract(TestTesseract.java:302) Caused by: java.lang.ClassNotFoundException: org.apache.commons.io.FileUtils at java.net.URLClassLoader$1.run(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) ... 10 more` – ABData Jan 28 '15 at 14:07
  • Ah I think I figured it out. Basically I need _[commons IO](http://commons.apache.org/proper/commons-io/)_ in order to run v1.4 – ABData Jan 28 '15 at 14:17