I am trying to use Amazon Textract via Python (boto3) interface. While uploading file from local drive everything goes well:
import boto3
import numpy as np
def filename_to_json(self, filename):
client = boto3.client('textract')
if filename is not None:
with open(filename, 'rb') as image:
response = client.detect_document_text(Document={'Bytes': image.read()})
return response
My question is how to modify client.detect_document_text() command to work on an image stored previously in a variable as a numpy ndarrya. From AWS Documentation I know that:
Bytes
A blob of base64-encoded document bytes. The maximum size of a document that's provided in a blob of bytes is 5 MB. The document bytes must be in PNG or JPEG format.
If you're using an AWS SDK to call Amazon Textract, you might not need to base64-encode image bytes passed using the Bytes field.
Type: Base64-encoded binary data object
but cannot figure out how to convert numpy ndarray to get a working code.
I already tried using a number of conversion method such as numpy.ndarray.tobytes(), base64.b64encode() but with no positive results.
P.S. I am new here, please be understanding.