45

I just started learning and using S3, read the docs. Actually I didn't find anything to fetch the file into an object instead of downloading it from S3? if this could be possible, or I am missing something?

Actually I want to avoid additional IO after downloading the file.

Carl G
  • 17,394
  • 14
  • 91
  • 115
Bruce_Wayne
  • 1,564
  • 3
  • 18
  • 41
  • Does GetObject (see https://docs.aws.amazon.com/AmazonS3/latest/dev/RetrievingObjectUsingNetSDK.html ) help? – sgmoore May 07 '16 at 11:34

3 Answers3

80

You might be looking for the get_object() method of the boto3 S3 client:

http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.get_object

This will get you a response object dictionary with member Body that is a StreamingBody object, which you can use as normal file and call .read() method on it. To get the entire content of the S3 object into memory you would do something like this:

s3_client = boto3.client('s3')
s3_response_object = s3_client.get_object(Bucket=BUCKET_NAME_STRING, Key=FILE_NAME_STRING)
object_content = s3_response_object['Body'].read()
grepe
  • 1,897
  • 2
  • 14
  • 24
  • Can you tell us how we can save this in local machine? I wanted to get mime of the file, for that I have to save it. – ashraf minhaj Jan 03 '23 at 05:42
  • For saving to local machine there's a different S3 API `download_file`. The file can get saved to any absolute path you specify. – Rohan Kumar Aug 10 '23 at 18:14
32

I prefer this approach, equivalent to a previous answer:

import boto3
s3 = boto3.resource('s3')
def read_s3_contents(bucket_name, key):
    response = s3.Object(bucket_name, key).get()
    return response['Body'].read()

But another approach could read the object into StringIO:

import StringIO
import boto3
s3 = boto3.resource('s3')
def read_s3_contents_with_download(bucket_name, key):
    string_io = StringIO.StringIO()
    s3.Object(bucket_name, key).download_fileobj(string_io)
    return string_io.getvalue()
Carl G
  • 17,394
  • 14
  • 91
  • 115
  • What is `return` here? Is this a function? Can you please post the whole working example? – Joe Jul 07 '18 at 20:03
  • Hi @Joe, I was using the `return` keyword here loosely to indicate the thing that a programmer wants. I wrapped the statements in function definitions to make it more clear. – Carl G Jul 07 '18 at 21:21
  • 1
    Thanks. How would you read PARQUET file from S3 into variable `string_io`? I tried above code and getting error: `TypeError: string argument expected, got 'bytes'`. – Joe Jul 07 '18 at 23:25
  • 2
    Hi @Joe, Python 3 has `BytesIO` you can try using instead of `StringIO`. If that doens't help, you might need to ask a new question. – Carl G Jul 08 '18 at 06:46
  • @Joe I have a solution to that problem here https://stackoverflow.com/questions/55732615/how-do-i-read-a-gzipped-parquet-file-from-s3-into-python-using-boto3/55732616 using the tip of `BytesIO` suggested by @Carl G – Corey Levinson Apr 17 '19 at 16:58
24

You could use StringIO and get file content from S3 using get_contents_as_string, like this:

import pandas as pd
from io import StringIO
from boto.s3.connection import S3Connection

AWS_KEY = 'XXXXXXDDDDDD'
AWS_SECRET = 'pweqory83743rywiuedq'
aws_connection = S3Connection(AWS_KEY, AWS_SECRET)
bucket = aws_connection.get_bucket('YOUR_BUCKET')

fileName = "test.csv"

content = bucket.get_key(fileName).get_contents_as_string()
reader = pd.read_csv(StringIO.StringIO(content))
Resigned June 2023
  • 4,638
  • 3
  • 38
  • 49
ar-ms
  • 735
  • 6
  • 14