0

I am using Google cloud vision OCR to detect text. The displayed text is always 1. detected text, 2. each of the detected words. I only want to display the detected text.

I am using the code from Google Cloud Platform Github where I set the type to Text Detection labelDetection.setType("TEXT_DETECTION"); in callCloudVision method.

I also modified the convertResponseToString method to:

private String convertResponseToString(BatchAnnotateImagesResponse response) {
        String message = "";

        List<EntityAnnotation> labels = response.getResponses().get(0).getTextAnnotations();
        for (EntityAnnotation label : labels) {
            if (labels != null) {
                System.out.println(label.getDescription());
                message += String.format(Locale.US, "%s", label.getDescription()) + "\n";
            }
            else
            {
                message += "nothing";
            }
        }
        return message;
    }

This is my gradle:

apply plugin: 'com.android.application'

android {
    compileSdkVersion 25
    defaultConfig {
        applicationId "com.example.mhci"
        minSdkVersion 24
        targetSdkVersion 25
        versionCode 1
        versionName "1.0"
        testInstrumentationRunner "android.support.test.runner.AndroidJUnitRunner"
        multiDexEnabled true
        javaCompileOptions {
            annotationProcessorOptions {
                includeCompileClasspath false
            }
        }
    }
    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android.txt'), 'proguard-rules.pro'
        }
    }
    packagingOptions {
        exclude 'META-INF/LICENSE'
        exclude 'META-INF/io.netty.versions.properties'
        exclude 'META-INF/INDEX.LIST'
        exclude 'META-INF/DEPENDENCIES'
    }
}

dependencies {
    compile fileTree(dir: 'libs', include: ['*.jar'])
    implementation 'com.android.support.constraint:constraint-layout:1.0.2'
    testCompile 'junit:junit:4.12'

    androidTestCompile('com.android.support.test.espresso:espresso-core:3.0.1', {
        exclude group: 'com.android.support', module: 'support-annotations'
    })

    compile 'com.android.support:appcompat-v7:25.1.1'
    compile 'com.android.support:design:25.4.0'

    compile 'com.google.api-client:google-api-client-android:1.20.0' exclude module: 'httpclient'
    compile 'com.google.http-client:google-http-client-gson:1.20.0' exclude module: 'httpclient'

    compile 'com.google.apis:google-api-services-vision:v1-rev2-1.21.0'

    compile 'com.android.support:design:25.4.0'

    compile ('com.google.apis:google-api-services-translate:v2-rev47-1.22.0') {
        exclude group: 'com.google.guava'
    }

    compile ('com.google.cloud:google-cloud-translate:0.5.0') {
        exclude group: 'io.grpc', module: 'grpc-all'
        exclude group: 'com.google.protobuf', module: 'protobuf-java'
        exclude group: 'com.google.api-client', module: 'google-api-client-appengine'
    }
}

The detected text of this image that was displayed is:

Hello world

Hello
world

But I want it to only display Hello world

How can I do it?

k8892
  • 1
  • 2
  • did you try TextAnnotation – krishank Tripathi Mar 10 '18 at 10:46
  • @krishankTripathi u mean instead of using EntityAnnotation, replace it with TextAnnotation? – k8892 Mar 10 '18 at 10:53
  • yes use the TextAnnotation and go through this link https://developers.google.com/resources/api-libraries/documentation/vision/v1/java/latest/com/google/api/services/vision/v1/model/TextAnnotation.html – krishank Tripathi Mar 10 '18 at 11:02
  • when you get the page from that method you can detect the block of words or use the block method to detect the block of works that is in Page method given below https://developers.google.com/resources/api-libraries/documentation/vision/v1/java/latest/com/google/api/services/vision/v1/model/Page.html – krishank Tripathi Mar 10 '18 at 11:09
  • The code I am using does not have TextAnnotation in it. It does not have the external library for it – k8892 Mar 10 '18 at 11:20
  • did you try to add that class in your code and which gradle you are using to get cloud visoin api service.please post you whole code along with gradle you are uisng – krishank Tripathi Mar 10 '18 at 11:31
  • If you only want to detect words and this api is not so important then try the below link https://stackoverflow.com/a/48781348/9287163 – krishank Tripathi Mar 10 '18 at 11:33
  • i have updated my question with the gradle im using. i tried to import the class using `import com.google.api.services.vision.v1.model.TextAnnotation;` but it has error stating `cannot resolve symbol TextAnnotation` – k8892 Mar 10 '18 at 12:18
  • this is the best documentation and example i came across for implementing cloud vision api pls go through this https://code.tutsplus.com/tutorials/how-to-use-the-google-cloud-vision-api-in-android-apps--cms-29009 – krishank Tripathi Mar 11 '18 at 03:34
  • Does my answer help? Please accept the answer if it helps because that's how the community benefits from Stackoverflow, thanks. – Ying Li Mar 22 '18 at 18:19

1 Answers1

0

The text detection should work even for this use case. The problem is that all labels returned everything, that's just how it's designed. You can see how if you have say two street signs side by side in a picture ("Main Street" and "Park Avenue"), you would want the API to break down what it's seeing into parts so it makes more sense. If it just returns one string of "Main Street Park Avenue", that information is not useful. That's why it always returns the whole thing and then all of its parts, so if you are doing a query through the returned strings, you will find the relevant pictures.

So basically if you trust it to read the label properly and in its entirety, you can simply use the first result in the returned array instead of all 3. Or you can implement some logic that display only the longest and most trusted results.

So basically, manipulate the returned list, List labels, and extract the kind of result you want. In your particular case, don't display the entire list, just take the first value in list and you will have what you want.

Ying Li
  • 2,500
  • 2
  • 13
  • 37