How to read content of a file from a folder in S3 bucket using python?

Question

I was trying to read a file from a folder structure in S3 bucket using python with boto3.

I want to return boolean value wether the report is present in S3 bucket or not.

Code

import boto3
import json

S3_BUCKET_NAME = ''
KEY = '@@@/%%%.json'


def notification():
    report = get_report()
    print(report)


def get_report():
    s3_client = boto3.client('s3')
    response = s3_client.get_object(Bucket=S3_BUCKET_NAME, Prefix=PREFIX, Key=KEY)
    data = response['Body'].read()
    report = json.loads(data)
    return report

How to check if the report is present and return a boolean value ?

S3 is an Object storage and unlike block storage or file storage the folder and the filename altogether constitute the object location so "Key" should be "/fee_summary/fee_summary_report.json" and you should drop the parameter "Prefix" — Orenico, Jan 30 '22 at 08:48
A `Key` should not start with a `/` slash. The `Key` would be: `fee_summary/fee_summary_report.json` — John Rotenstein, Jan 30 '22 at 11:40

score 4 · Answer 1 · answered Feb 08 '22 at 14:35

2 answers to your questions:

How to read content of a file from a folder in S3 bucket using python?
How to check if the report is present and return a boolean value ?

Get S3-object

S3-object as bytes

    s3_client = boto3.client('s3')
    response = s3_client.get_object(Bucket=S3_BUCKET_NAME, Prefix=PREFIX, Key=KEY)
    bytes = response['Body'].read()  # returns bytes since Python 3.6+

NOTE: For Python 3.6+ read() returns bytes. So if you want to get a string out of it, you must use .decode(charset) on it:

pythonObject = json.loads(obj['Body'].read().decode('utf-8'))

S3-object as string

See Open S3 object as a string with Boto3.

Check if S3-object is present

For example to check the availability of the report as S3.Object just retrieve it and test on the key attribute:

import boto3
import json

S3_BUCKET_NAME = ''
KEY = 'fee_summary/fee_summary_report.json'


def send_fee_summary_notification():
    fee_summary_report = get_fee_summary_report()
    print(fee_summary_report)


def get_fee_summary_report():
    s3_client = boto3.client('s3')
    response = s3_client.get_object(Bucket=S3_BUCKET_NAME, Prefix=PREFIX, Key=KEY)
    data = response['Body'].read()
    fee_summary_report = json.loads(data)
    return fee_summary_report


def has_fee_summary_report():    
    s3 = boto3.client('s3')
    obj = s3.Object(S3_BUCKET_NAME, KEY).get()  # define object with KEY (report) and get
    return obj.key != None # returns False if not found

Use paging to literally scan (for debugging)

You can also iterate over all objects in you bucket via paging and test, if the desired report (with specified KEY) exists:

for page in s3.Bucket('boto3').objects.pages():
    for obj in page:
        print(obj.key)  # debug print
        if obj.key == KEY:
            return True
    return False

score 1 · Answer 2 · answered Jan 31 '22 at 04:14

See example below, I have created for you..

import json
import boto3


def lambda_handler(event, context):
    
    S3_BUCKET_NAME = ''
    KEY = 'fee_summary_report.json'
    s3_client = boto3.client('s3')
    response = s3_client.get_object(Bucket='feesummarybucketmmmm', Key=KEY)
    data = response['Body'].read()
    print(response)
    print(data)
    fee_summary_report = json.loads(data)
    
    # TODO implement
    return {
        'statusCode': 200,
        'body': fee_summary_report
    }

https://github.com/mmakadiya/public_files/blob/main/read_s3_file.py

I was confused by what `key` was; this example clearly shows what that is. Also rememeber to call `decode(charset)` as mentioned on the solution https://stackoverflow.com/a/71035707/530399 by @hc_dev — Bikash Gyawali, Jun 10 '22 at 13:54