1

I am using AWS Textract to OCR images and create a searchable PDF as outlined in this AWS blog post. The basic request code looks like this:

AmazonTextractClientBuilder builder = AmazonTextractClientBuilder.standard();
DetectDocumentTextRequest request = new DetectDocumentTextRequest()
                .withDocument(new Document()
                        .withBytes(imageBytes));
DetectDocumentTextResult result = client.detectDocumentText(request);
List<Block> blocks = result.getBlocks()
  

This works out great however I would also like to write out and keep the original response JSON that contains all the information on what was detected where etc.

Is there a way to get to the response JSON using the JAVA SDK?

1 Answers1

0

AWS doesn't return the response JSON to you in raw form. The assumption may have been that it wouldn't be required once it has been converted to a DetectDocumentTextResult object.

You are able to convert the DetectDocumentTextResult object to JSON (example) which should provide identical values. Note that the variable names will not be identical (e.g.: DocumentMetadata vs documentMetadata) but the values of those variables will be the same.

Cosmittus
  • 637
  • 6
  • 19