I'm working on Mac OS, using tesseract to do the OCR.
I installed tesseract with homebrew.
The tesseract works well with command line, and the java program works well with the basic example of Tesseract.getInstance().
But since I want to get the confidence value of each character, I switch to use the TessAPI1 and had the error below:
Exception in thread "main" java.lang.UnsatisfiedLinkError: Error looking up function 'TessResultRendererAddError': dlsym(0x7f9df9c38c20, TessResultRendererAddError): symbol not found
at com.sun.jna.Function.<init>(Function.java:208)
at com.sun.jna.NativeLibrary.getFunction(NativeLibrary.java:536)
at com.sun.jna.NativeLibrary.getFunction(NativeLibrary.java:513)
at com.sun.jna.NativeLibrary.getFunction(NativeLibrary.java:499)
at com.sun.jna.Native.register(Native.java:1509)
at com.sun.jna.Native.register(Native.java:1396)
at com.sun.jna.Native.register(Native.java:1156)
at net.sourceforge.tess4j.TessAPI1.<clinit>(Unknown Source)
at TesseractWrapper.doOCR(TesseractWrapper.java:71)
at OCR.main(OCR.java:6)
The error occurred at
handle = TessAPI1.TessBaseAPICreate();
The code looks like below:
TessAPI1.TessBaseAPI handle;
handle = TessAPI1.TessBaseAPICreate();
new File(this.path);
BufferedImage image = ImageIO.read(
new FileInputStream(tiff)); // require jai-imageio lib to read TIFF
ByteBuffer buf = ImageIOHelper.convertImageData(image);
int bpp = image.getColorModel().getPixelSize();
int bytespp = bpp / 8;
int bytespl = (int) Math.ceil(image.getWidth() * bpp / 8.0);
TessAPI1.TessBaseAPIInit3(handle,
"tessdata", lang);
TessAPI1.TessBaseAPISetPageSegMode(handle, TessAPI1.TessPageSegMode.
PSM_AUTO);
TessAPI1.TessBaseAPISetImage(handle, buf, image.getWidth(), image.getHeight(), bytespp, bytespl);
TessAPI1.TessBaseAPIRecognize(handle,
null);
TessAPI1.TessResultIterator ri = TessAPI1.TessBaseAPIGetIterator(handle);
TessAPI1.TessPageIterator pi = TessAPI1.TessResultIteratorGetPageIterator(ri);
TessAPI1.TessPageIteratorBegin(pi);
I found this code from some other question and I guess what I need is to get and 'iterator' and then I could get the character with its confidence value one by one.