we are parsing an image showing a textsnippet which has a resolution of 2121x105 px. In Java we have the following code to get an byte array (one of our constraints is to work with a byte array here):
import org.apache.commons.io.IOUtils;
...
InputStream is = getAssets().open("images/text.png");
byte[] bytes = IOUtils.toByteArray(is);
This byte array is then passed to the native C++ code - we are not using the Java wrapper of tess-two, we use the native libraries though. In the native code we are trying to get the text of the image with GetUTF8Text(). Then we saw that tess-two has already an implementation for setting the image to read from by passing it as a byte array:
void Java_com_..._TessBaseAPI_nativeSetImageBytes(JNIEnv *env,
jobject thiz,
jlong mNativeData,
jbyteArray data,
jint width,
jint height,
jint bpp,
jint bpl) {
...
We figured that bpp for a PNG should be 4 (RGBA). It's not clear though what is is expected for bpl. If we set the width of the image muliplied by bpp then we get a segmentation error. If we set it to zero an empty string is returned.
UPDATE: The semgentation error is thrown in GetUTF8Text() and not in SetImage().
SIGSEGV (signal SIGSEGV: invalid address (fault address: 0xc))