59

I'm working in a Python web environment and I can simply upload a file from the filesystem to S3 using boto's key.set_contents_from_filename(path/to/file). However, I'd like to upload an image that is already on the web (say https://pbs.twimg.com/media/A9h_htACIAAaCf6.jpg:large).

Should I somehow download the image to the filesystem, and then upload it to S3 using boto as usual, then delete the image?

What would be ideal is if there is a way to get boto's key.set_contents_from_file or some other command that would accept a URL and nicely stream the image to S3 without having to explicitly download a file copy to my server.

def upload(url):
    try:
        conn = boto.connect_s3(settings.AWS_ACCESS_KEY_ID, settings.AWS_SECRET_ACCESS_KEY)
        bucket_name = settings.AWS_STORAGE_BUCKET_NAME
        bucket = conn.get_bucket(bucket_name)
        k = Key(bucket)
        k.key = "test"
        k.set_contents_from_file(url)
        k.make_public()
                return "Success?"
    except Exception, e:
            return e

Using set_contents_from_file, as above, I get a "string object has no attribute 'tell'" error. Using set_contents_from_filename with the url, I get a No such file or directory error . The boto storage documentation leaves off at uploading local files and does not mention uploading files stored remotely.

dgh
  • 8,969
  • 9
  • 38
  • 49
  • Are you just trying to avoid writing to disk? Or are you trying to avoid transferring the file to your machine at all? – Emily Jan 15 '13 at 21:00
  • Well, ideally, a URL could be passed to S3 so that my server does not have to write to disk or load in memory at all. I think this is not a reasonable expectation of the S3 service though. If my server must handle this, I'd prefer not to write to disk. – dgh Jan 15 '13 at 21:08

10 Answers10

38

Here is how I did it with requests, the key being to set stream=True when initially making the request, and uploading to s3 using the upload.fileobj() method:

import requests
import boto3

url = "https://upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg"
r = requests.get(url, stream=True)

session = boto3.Session()
s3 = session.resource('s3')

bucket_name = 'your-bucket-name'
key = 'your-key-name' # key is the name of file on your bucket

bucket = s3.Bucket(bucket_name)
bucket.upload_fileobj(r.raw, key)
Chirag Kalal
  • 638
  • 1
  • 7
  • 21
blaklaybul
  • 810
  • 2
  • 7
  • 11
  • 1
    I'm just learning boto and getting more familiar with AWS. Could you tell me in laymens terms why you can't just do `s3 = boto3.resource('s3')`? Isn't a default session started? – heartmo Oct 14 '18 at 07:20
  • 4
    @heartmo the discussion here provides a great overview of the differences between client, session, and resource. https://stackoverflow.com/questions/42809096/difference-in-boto3-between-resource-client-and-session – blaklaybul Oct 15 '18 at 13:05
  • Worked. Thanks a lot. – Geshan Ravindu Dec 11 '21 at 07:22
23

Ok, from @garnaat, it doesn't sound like S3 currently allows uploads by url. I managed to upload remote images to S3 by reading them into memory only. This works.

def upload(url):
    try:
        conn = boto.connect_s3(settings.AWS_ACCESS_KEY_ID, settings.AWS_SECRET_ACCESS_KEY)
        bucket_name = settings.AWS_STORAGE_BUCKET_NAME
        bucket = conn.get_bucket(bucket_name)
        k = Key(bucket)
        k.key = url.split('/')[::-1][0]    # In my situation, ids at the end are unique
        file_object = urllib2.urlopen(url)           # 'Like' a file object
        fp = StringIO.StringIO(file_object.read())   # Wrap object    
        k.set_contents_from_file(fp)
        return "Success"
    except Exception, e:
        return e

Also thanks to How can I create a GzipFile instance from the “file-like object” that urllib.urlopen() returns?

Community
  • 1
  • 1
dgh
  • 8,969
  • 9
  • 38
  • 49
  • 5
    I'm not 100% sure, but I believe `url.split('/')[::-1][0]` could simply be rewritten as `url.split('/')[-1]`. I mean, I can't think of any cases where the result would be different. – Jordan Reiter Mar 25 '16 at 21:15
19

For a 2017-relevant answer to this question which uses the official 'boto3' package (instead of the old 'boto' package from the original answer):

Python 3.5

If you're on a clean Python install, pip install both packages first:

pip install boto3

pip install requests

import boto3
import requests

# Uses the creds in ~/.aws/credentials
s3 = boto3.resource('s3')
bucket_name_to_upload_image_to = 'photos'
s3_image_filename = 'test_s3_image.png'
internet_image_url = 'https://docs.python.org/3.7/_static/py.png'


# Do this as a quick and easy check to make sure your S3 access is OK
for bucket in s3.buckets.all():
    if bucket.name == bucket_name_to_upload_image_to:
        print('Good to go. Found the bucket to upload the image into.')
        good_to_go = True

if not good_to_go:
    print('Not seeing your s3 bucket, might want to double check permissions in IAM')

# Given an Internet-accessible URL, download the image and upload it to S3,
# without needing to persist the image to disk locally
req_for_image = requests.get(internet_image_url, stream=True)
file_object_from_req = req_for_image.raw
req_data = file_object_from_req.read()

# Do the actual upload to s3
s3.Bucket(bucket_name_to_upload_image_to).put_object(Key=s3_image_filename, Body=req_data)
GluePear
  • 7,244
  • 20
  • 67
  • 120
GISD
  • 191
  • 1
  • 4
  • I am getting exception with above approach :S3 uploading Exception:_send_request() takes 5 positional arguments but 6 were given – ifti May 22 '17 at 12:38
  • 1
    @ifti Looks like you might have run into this bug - https://github.com/boto/botocore/issues/1079 It looks like it's been fixed now. – GISD Oct 26 '17 at 22:11
11

Unfortunately, there really isn't any way to do this. At least not at the moment. We could add a method to boto, say set_contents_from_url, but that method would still have to download the file to the local machine and then upload it. It might still be a convenient method but it wouldn't save you anything.

In order to do what you really want to do, we would need to have some capability on the S3 service itself that would allow us to pass it the URL and have it store the URL to a bucket for us. That sounds like a pretty useful feature. You might want to post that to the S3 forums.

garnaat
  • 44,310
  • 7
  • 123
  • 103
  • Thanks, glad to know I'm not missing out on a potentially useful S3 feature. I recorded a feature request in the forums. – dgh Jan 15 '13 at 21:59
  • 1
    this can be done by streaming the contents of the request with `stream=True`, using boto's `upload_fileobj()`. See my answer below for details. – blaklaybul Oct 11 '18 at 12:53
8

A simple 3-lines implementation that works on a lambda out-of-the-box:

import boto3
import requests

s3_object = boto3.resource('s3').Object(bucket_name, object_key)

with requests.get(url, stream=True) as r:
    s3_object.put(Body=r.content)

The source for the .get part comes straight from the requests documentation

Filippo Vitale
  • 7,597
  • 3
  • 58
  • 64
4
from io import BytesIO
def send_image_to_s3(url, name):
    print("sending image")
    bucket_name = 'XXX'
    AWS_SECRET_ACCESS_KEY = "XXX"
    AWS_ACCESS_KEY_ID = "XXX"

    s3 = boto3.client('s3', aws_access_key_id=AWS_ACCESS_KEY_ID,
                      aws_secret_access_key=AWS_SECRET_ACCESS_KEY)

    response = requests.get(url)
    img = BytesIO(response.content)

    file_name = f'path/{name}'
    print('sending {}'.format(file_name))
    r = s3.upload_fileobj(img, bucket_name, file_name)

    s3_path = 'path/' + name
    return s3_path
Antonio
  • 2,848
  • 2
  • 15
  • 15
3

I have tried as following with boto3 and it works me:

import boto3;
import contextlib;
import requests;
from io import BytesIO;

s3 = boto3.resource('s3');
s3Client = boto3.client('s3')
for bucket in s3.buckets.all():
  print(bucket.name)


url = "@resource url";
with contextlib.closing(requests.get(url, stream=True, verify=False)) as response:
        # Set up file stream from response content.
        fp = BytesIO(response.content)
        # Upload data to S3
        s3Client.upload_fileobj(fp, 'aws-books', 'reviews_Electronics_5.json.gz')
yunus kula
  • 859
  • 3
  • 10
  • 31
2

Using the boto3 upload_fileobj method, you can stream a file to an S3 bucket, without saving to disk. Here is my function:

import boto3
import StringIO
import contextlib
import requests

def upload(url):
    # Get the service client
    s3 = boto3.client('s3')

    # Rember to se stream = True.
    with contextlib.closing(requests.get(url, stream=True, verify=False)) as response:
        # Set up file stream from response content.
        fp = StringIO.StringIO(response.content)
        # Upload data to S3
        s3.upload_fileobj(fp, 'my-bucket', 'my-dir/' + url.split('/')[-1])
x00x70
  • 41
  • 4
2

S3 doesn't support remote upload as of now it seems. You may use the below class for uploading an image to S3. The upload method here first tries to download the image and keeps it in memory for sometime until it gets uploaded. To be able to connect to S3 you will have to install AWS CLI using command pip install awscli, then enter few credentials using command aws configure:

import urllib3
import uuid
from pathlib import Path
from io import BytesIO
from errors import custom_exceptions as cex

BUCKET_NAME = "xxx.yyy.zzz"
POSTERS_BASE_PATH = "assets/wallcontent"
CLOUDFRONT_BASE_URL = "https://xxx.cloudfront.net/"


class S3(object):
    def __init__(self):
        self.client = boto3.client('s3')
        self.bucket_name = BUCKET_NAME
        self.posters_base_path = POSTERS_BASE_PATH

    def __download_image(self, url):
        manager = urllib3.PoolManager()
        try:
            res = manager.request('GET', url)
        except Exception:
            print("Could not download the image from URL: ", url)
            raise cex.ImageDownloadFailed
        return BytesIO(res.data)  # any file-like object that implements read()

    def upload_image(self, url):
        try:
            image_file = self.__download_image(url)
        except cex.ImageDownloadFailed:
            raise cex.ImageUploadFailed

        extension = Path(url).suffix
        id = uuid.uuid1().hex + extension
        final_path = self.posters_base_path + "/" + id
        try:
            self.client.upload_fileobj(image_file,
                                       self.bucket_name,
                                       final_path
                                       )
        except Exception:
            print("Image Upload Error for URL: ", url)
            raise cex.ImageUploadFailed

        return CLOUDFRONT_BASE_URL + id
Prateek Bhuwania
  • 755
  • 1
  • 8
  • 19
1
import boto
from boto.s3.key import Key
from boto.s3.connection import OrdinaryCallingFormat
from urllib import urlopen


def upload_images_s3(img_url):
    try:
        connection = boto.connect_s3('access_key', 'secret_key', calling_format=OrdinaryCallingFormat())       
        bucket = connection.get_bucket('boto-demo-1519388451')
        file_obj = Key(bucket)
        file_obj.key = img_url.split('/')[::-1][0]
        fp = urlopen(img_url)
        result = file_obj.set_contents_from_string(fp.read())
    except Exception, e:
        return e