Reading an JSON file from S3 using Python boto3

Question

I kept following JSON in S3 bucket 'test'

{
  'Details' : "Something" 
}

I am using following code to read this JSON and printing the key 'Details'

s3 = boto3.resource('s3',
                    aws_access_key_id=<access_key>,
                    aws_secret_access_key=<secret_key>
                    )
content_object = s3.Object('test', 'sample_json.txt')
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = json.loads(repr(file_content))
print(json_content['Details'])

And i am getting error as 'string indices must be integers' I don't want to download the file from S3 and then reading..

@AlexHall Initially i tried by removing `repr `, but it didn't work, it is giving **ValueError: Expecting property name enclosed in double quotes** — Nanju, Dec 07 '16 at 04:37
I resolved the problem.. JSON should have attributes enclosed in double quotes.. i changes my json format — Nanju, Dec 07 '16 at 07:06
Which line are you getting an error on? Split up that line. `file_content = content_object...` is 4 steps in one line. For now, split that up into 4 separate lines with 4 intermediate variables. Then see which line fails. — falsePockets, May 24 '17 at 01:22
All I needed for my issue was '.read().decode('utf-8')' so thank you for asking (-; — soBusted, Jan 13 '21 at 17:33

score 106 · Answer 1 · answered Nov 05 '17 at 11:56

106

As mentioned in the comments above, repr has to be removed and the json file has to use double quotes for attributes. Using this file on aws/s3:

{
  "Details" : "Something"
}

and the following Python code, it works:

import boto3
import json

s3 = boto3.resource('s3')

content_object = s3.Object('test', 'sample_json.txt')
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = json.loads(file_content)
print(json_content['Details'])
# >> Something

answered Nov 05 '17 at 11:56

bastelflp

9,362
7
32
67

41

Note to others: https://boto3.readthedocs.io/en/latest/reference/services/s3.html#object `s3.Object('bucketName', 'keyName')` so an example to get the file `s3://foobarBucketName/folderA/folderB/myFile.json` would be `s3.Object('foobarBucketName', 'folderA/folderB/myFile.json')` – Kyle Bridenstine Jun 22 '18 at 18:24

Hafizur Rahman · Answer 2 · 2022-09-09T00:08:20.907

The following worked for me.

# read_s3.py

from boto3 import client

BUCKET = 'MY_S3_BUCKET_NAME'
FILE_TO_READ = 'FOLDER_NAME/my_file.json'
client = client('s3',
                 aws_access_key_id='MY_AWS_KEY_ID',
                 aws_secret_access_key='MY_AWS_SECRET_ACCESS_KEY'
                )
result = client.get_object(Bucket=BUCKET, Key=FILE_TO_READ) 
text = result["Body"].read().decode()
print(text['Details']) # Use your desired JSON Key for your value

Further Improvement

Let's call the above code snippet as read_s3.py.

It is not good idea to hard code the AWS Id & Secret Keys directly. For best practices, you can consider either of the followings:

(1) Read your AWS credentials from a json file (aws_cred.json) stored in your local storage:

from json import load
from boto3 import client
...
credentials = load(open('local_fold/aws_cred.json'))
client = client('s3',
                 aws_access_key_id=credentials['MY_AWS_KEY_ID'],
                 aws_secret_access_key=credentials['MY_AWS_SECRET_ACCESS_KEY']
                )

(2) Read from your environment variable (my preferred option for deployment):

    from os import environ
    client = boto3.client('s3',              
                         aws_access_key_id=environ['MY_AWS_KEY_ID'],
                           aws_secret_access_key=environ['MY_AWS_SECRET_ACCESS_KEY']
                         )

Let's prepare a shell script called read_s3_using_env.sh for setting the environment variables and add our python script (read_s3.py) there as follows:

# read_s3_using_env.sh
export MY_AWS_KEY_ID='YOUR_AWS_ACCESS_KEY_ID'
export MY_AWS_SECRET_ACCESS_KEY='YOUR_AWS_SECRET_ACCESS_KEY'
# execute the python file containing your code as stated above that reads from s3
python read_s3.py # will execute the python script to read from s3

Now execute the shell script in a terminal as follows:

sh read_s3_using_env.sh

You don't need to specify credentials on the client initialization, it's automatically handled by the boto3 and other AWS SDKs. Which allow users to automatically authenticate with whatever way they choose to (could be IAM roles instead) — Pedro, Aug 26 '21 at 07:31
@Hafizur Rahman- The variable `text` here is a string, so ```print(text['Details'])``` will not work. I believe you would have to update the code snippet accordingly. — Varun, Mar 09 '22 at 17:13

score 38 · Answer 3 · edited Jan 10 '20 at 05:49

38

Wanted to add that the botocore.response.streamingbody works well with json.load:

import json
import boto3

s3 = boto3.resource('s3')

obj = s3.Object(bucket, key)
data = json.load(obj.get()['Body'])

edited Jan 10 '20 at 05:49

adamarla

27
6

answered Nov 08 '19 at 18:56

alukach

5,921
3
39
40

note: json.loads (with s) will not work here – Pedro Aug 26 '21 at 07:59

score 5 · Answer 4 · answered Aug 26 '20 at 14:46

You can use the below code in AWS Lambda to read the JSON file from the S3 bucket and process it using python.

import json
import boto3
import sys
import logging

# logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

VERSION = 1.0

s3 = boto3.client('s3')

def lambda_handler(event, context):
    bucket = 'my_project_bucket'
    key = 'sample_payload.json'
    
    response = s3.get_object(Bucket = bucket, Key = key)
    content = response['Body']
    jsonObject = json.loads(content.read())
    print(jsonObject)

score 4 · Answer 5 · answered Apr 12 '18 at 00:04

I was stuck for a bit as the decoding didn't work for me (s3 objects are gzipped).

Found this discussion which helped me: Python gzip: is there a way to decompress from a string?

import boto3
import zlib

key = event["Records"][0]["s3"]["object"]["key"]
bucket_name = event["Records"][0]["s3"]["bucket"]["name"]

s3_object = S3_RESOURCE.Object(bucket_name, key).get()['Body'].read()

jsonData = zlib.decompress(s3_object, 16+zlib.MAX_WBITS)

If youprint jsonData, you'll see your desired JSON file! If you are running test in AWS itself, be sure to check CloudWatch logs as in lambda it wont output full JSON file if its too long.

score 1 · Answer 6 · answered Jun 02 '22 at 05:41

This is easy to do with cloudpathlib, which supports S3 and also Google Cloud Storage and Azure Blob Storage. Here's a sample:

import json
from cloudpathlib import CloudPath


# first, we'll write some json data so then we can later read it
CloudPath("s3://mybucket/asdf.json").write_text('{"field": "value"}')
#> 18


# read data from S3
data = json.loads(
    CloudPath("s3://mybucket/asdf.json").read_text()
)

# look at the data
data
#> {'field': 'value'}

# access it now that it is loaded in Python
data["field"] == "value"
#> True

This comes with a few added benefits in terms of setting particular options or different authentication mechanisms or keeping a persistent cache so you don't always need to redownload from S3.

Wesley Cheek · Answer 7 · 2023-03-21T03:38:46.167

0

If your json file looks like this:

{
    "test": "test123"
}

You can access it like a dict like this:

BUCKET="Bucket123"

def get_json_from_s3(key: str):
    """
    Retrieves the json file containing responses from s3. returns a dict

    Args:
        key (str): file path to the json file

    Returns:
        dict: json style dict
    """
    data = client.get_object(Bucket=BUCKET, Key=key)
    json_text = data["Body"].read().decode("utf-8")
    json_text_object = json.loads(json_text)
    return json_text_object
test_dict = get_json_from_s3(key="test.json")
print(test_dict["test"])

edited Mar 21 '23 at 03:38

answered Mar 11 '22 at 01:29

Wesley Cheek

1,058
12
22

The variable names are misleading here. `json_text_bytes` contains the JSON test, while `json_text` contains the JSON object. – Tomer Mar 16 '23 at 11:33

Reading an JSON file from S3 using Python boto3

7 Answers7

Linked