AWS textract Extract the meta-data and confidence score

Question

Hi all i have extracted the document meta-data from AWS texttract Asynchronous call using java SDK but the meta-data is segregated into multiple blocks and it's huge.

How to extract the confidence score, value and its field name separately using java code i want to extract result something like below:


[{
  "Field" : "FirstName",
  "Value" : "XXXXX",
  "confidence Score" : "98.88"
},
{
  "Field" : "LastName",
  "Value" : "XXXXX",
  "confidence Score" : "65.98"
}]

Could anyone please suggest how to extract the field,value and its confidence score from aws texttract document meta-data?

anyone having any idea on this?

score 1 · Accepted Answer · answered Jan 12 '20 at 14:12

1

AWS has provided an example for mapping key and value pairs in python. You can use this code to understand the logic and come up with your own code in JAVA.

Source: https://docs.aws.amazon.com/textract/latest/dg/examples-extract-kvp.html

answered Jan 12 '20 at 14:12

Ninad Gaikwad

4,272
2
13
23

Daniel Xav De Oliveira · Answer 2 · 2020-06-26T11:07:31.140

I have just begun with AWS Textract too in Java and wow what a great tool ! I have included code in my answer at this link if you would like to take a look :)

It extracts the keys and values. I suggest you create a model with Key, Value and confidence scores and then create an object for each key value pair

    public static ArrayList<KVPair> getKVObjects(List<Block> keyMap, List<Block> valueMap, List<Block> blockMap ) {
    ArrayList<KVPair> labelValues = new ArrayList<>();

    Block value_block;


    for (Block key_block : keyMap) {

        value_block = findValueBlock(key_block, valueMap);
        String key = getText(key_block, blockMap);
        Float top = value_block.getGeometry().getBoundingBox().getTop();
        Float left = value_block.getGeometry().getBoundingBox().getLeft();
        Float confidenceScore = value_block.getConfidence();


        Optional<KVPair> label= (labelValues.stream().filter(x-> x.getLabel().equals(key)).findFirst());

        Property property = new Property();
        property.setValue(getText(value_block, blockMap));
        property.setLocationLeft(left);
        property.setLocationTop(top);
        property.setConfidenceScore(confidenceScore);
        if(label.isPresent()){
            label.get().setProperties(property);
        }else{
            KVPair KVPair = new KVPair();
            KVPair.setLabel(key);
            KVPair.setProperties(property);
            labelValues.add(KVPair);
     }



    }

    return labelValues;

}

AWS-Textract-Key-Value-Pair Java - thread "main" java.lang.NullPointerException

AWS textract Extract the meta-data and confidence score

2 Answers2