Can't read PNG files from S3 in Python 3?

Question

I have a bucket on S3. I want to be able to connect to it and read the pictures/PDFs into my EC2 machine memory, perform OCR and get needed fields.

Here is what I have done so far but unfortunately it doesn't work.

import cv2
import boto3
import matplotlib
import pytesseract
from PIL import Image


boto3.setup_default_session(profile_name='default-mfasession')
s3_client = boto3.client('s3')
s3_resource = boto3.resource('s3')
bucket_name = "my_bucket"
key = "my-files/._Screenshot 2020-04-20 at 14.21.20.png"

bucket = s3_resource.Bucket(bucket_name)
object = bucket.Object(key)
response = object.get()
file_stream = response['Body']
im = Image.open(file_stream)
np.array(im)

Returns me an error:

UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7fae33dce110>

I have tried all the answers related to this issue in SO nothing helped. Including: matplotlib: ValueError: invalid PNG header and PIL cannot identify image file for io.BytesIO object

Please advise how to solve it?

Are you positively, absolutely, definitely sure it **is** a PNG file? I.e., you are not blindly believing the file extension or what other tools say but you opened it with a hex viewer and saw the magic byte header (and other easily recognizable parts)? — Jongware, Apr 26 '20 at 09:00
@usr2564301 I know what I have in my bucket, but this point is in my head (I will probably get PDF, GIF, JPEG ... files with image and I need to parse them. — SteveS, Apr 26 '20 at 09:11

Marcin · Accepted Answer · 2020-04-26T08:51:54.903

3

This is what I usually use. Maybe it will work for you as well:

def image_from_s3(bucket, key):

    bucket = s3_resource.Bucket(bucket)
    image = bucket.Object(key)
    img_data = image.get().get('Body').read()

    return Image.open(io.BytesIO(img_data))

And in your handler you execute this:

    img = image_from_s3(image_bucket, image_key)

img should be Pillow's image if it successfully executes.

edited Apr 26 '20 at 08:51

answered Apr 26 '20 at 08:34

Marcin

215,873
14
235
294

AttributeError: 'S3' object has no attribute 'Bucket' @marcin – SteveS Apr 26 '20 at 08:49
1

I think it should be `s3_resource`. I edited the code a bit. – Marcin Apr 26 '20 at 08:51
Still doesn't work... UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f21ecef0710> @marcin – SteveS Apr 26 '20 at 08:53
2

Can you share example png which does not work? I or others can try to replicate the problem. – Marcin Apr 26 '20 at 08:54
1

It was .filename (dot filename) probably a cache or something. Fixed it and it works now. Thanks! – SteveS Apr 26 '20 at 09:14

Can't read PNG files from S3 in Python 3?

1 Answers1