I have a bucket on S3. I want to be able to connect to it and read the pictures/PDFs into my EC2 machine memory, perform OCR and get needed fields.
Here is what I have done so far but unfortunately it doesn't work.
import cv2
import boto3
import matplotlib
import pytesseract
from PIL import Image
boto3.setup_default_session(profile_name='default-mfasession')
s3_client = boto3.client('s3')
s3_resource = boto3.resource('s3')
bucket_name = "my_bucket"
key = "my-files/._Screenshot 2020-04-20 at 14.21.20.png"
bucket = s3_resource.Bucket(bucket_name)
object = bucket.Object(key)
response = object.get()
file_stream = response['Body']
im = Image.open(file_stream)
np.array(im)
Returns me an error:
UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7fae33dce110>
I have tried all the answers related to this issue in SO nothing helped. Including: matplotlib: ValueError: invalid PNG header and PIL cannot identify image file for io.BytesIO object
Please advise how to solve it?