What is the difference between S3.Client.upload_file() and S3.Client.upload_fileobj()?

Question

According to S3.Client.upload_file and S3.Client.upload_fileobj, upload_fileobj may sound faster. But does anyone know specifics? Should I just upload the file, or should I open the file in binary mode to use upload_fileobj? In other words,

import boto3

s3 = boto3.resource('s3')

### Version 1
s3.meta.client.upload_file('/tmp/hello.txt', 'mybucket', 'hello.txt')

### Version 2
with open('/tmp/hello.txt', 'rb') as data:
    s3.upload_fileobj(data, 'mybucket', 'hello.txt')

Is version 1 or version 2 better? Is there a difference?

score 32 · Answer 1 · edited Jun 28 '22 at 08:36

The main point with upload_fileobj is that file object doesn't have to be stored on local disk in the first place, but may be represented as file object in RAM.

Python have standard library module for that purpose.

The code will look like

import io
import boto3

s3 = boto3.client('s3')

fo = io.BytesIO(b'my data stored as file object in RAM')
s3.upload_fileobj(fo, 'mybucket', 'hello.txt')

In that case it will perform faster, since you don't have to read from local disk.

ShmulikA · Answer 2 · 2018-09-14T18:38:35.077

TL;DR

in terms of speed, both methods will perform roughly the same, both written in python and the bottleneck will be either disk-io (read file from disk) or network-io (write to s3).

use upload_file() when writing code that only handles uploading files from disk.
use upload_fileobj() when you writing generic code to handle s3 upload that may be reused in future for not only file from disk usecase.

What is fileobj anyway?

there is convention in multiple places including the python standard library, that when one is using the term fileobj she means file-like object. There are even some libraries exposing functions that can take file path (str) or fileobj (file-like object) as the same parameter.

when using file object your code is not limited to disk only, for example:

for example you can copy data from one s3 object into another in streaming fashion (without using disk space or slowing down the process for doing read/write io to disk).
you can (de)compress or decrypt data on the fly when writing objects to S3

example using python gzip module with file-like object in generic way:

import gzip, io

def gzip_greet_file(fileobj):
    """write gzipped hello message to a file"""
    with gzip.open(filename=fileobj, mode='wb') as fp:
        fp.write(b'hello!')

# using opened file
gzip_greet_file(open('/tmp/a.gz', 'wb'))

# using filename from disk
gzip_greet_file('/tmp/b.gz')

# using io buffer
file = io.BytesIO()
gzip_greet_file(file)
file.seek(0)
print(file.getvalue())

tarfile on the other hand has two parameters file & fileobj:

tarfile.open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)

Example compression on-the-fly with `s3.upload_fileobj()`

import gzip, boto3

s3 = boto3.resource('s3')


def upload_file(fileobj, bucket, key, compress=False):
    if compress:
        fileobj = gzip.GzipFile(fileobj=fileobj, mode='rb')
        key = key + '.gz'
    s3.upload_fileobj(fileobj, bucket, key)

My use involves csv files, so since the python `csv` library does not play nicely with bytes, I guess I should just stay with `s3.upload_file` for convenience. — Flair, Sep 14 '18 at 19:59
the examples above can be used both with bytes & str just change the mode correctly. btw with the csv module accepts file-object so either way you should open the file manually. in any case good luck with your implementation — ShmulikA, Sep 14 '18 at 20:05
https://stackoverflow.com/questions/50120806/how-to-write-a-csv-file-in-binary-mode seems to suggest that using `csv` in `Python3` is troublesome. — Flair, Sep 14 '18 at 23:32
What exactly is troublesome with csv? the first answer on the thread seems to help the OP. How do you try to handle csv otherwise in python? If you got further questions regarding csv handling you might want to open another question so we can help you with it. — ShmulikA, Sep 17 '18 at 04:28
The above answer is not 100% right. Please refer https://stackoverflow.com/a/57872570/6286278 — praveen, Sep 10 '19 at 14:12

score 2 · Answer 3 · answered Sep 14 '18 at 18:05

Neither is better, because they're not comparable. While the end result is the same (an object is uploaded to S3), they source that object quite differently. One expects you to supply the path on disk of the file to upload while the other expects you to provide a file-like object.

If you have a file on disk and want to upload it, then use upload_file. If you have a file-like object (which could ultimately be many things including an open file, a stream, a socket, a buffer, a string) then use upload_fileobj.

A 'file-like object' in this context is anything that implements the read method, and returns bytes.

score 2 · Answer 4 · answered Sep 10 '19 at 14:07

2

As per the documentation in https://boto3.amazonaws.com/v1/documentation/api/1.9.185/guide/s3-uploading-files.html

"The upload_file and upload_fileobj methods are provided by the S3 Client, Bucket, and Object classes. The method functionality provided by each class is identical. No benefits are gained by calling one class's method over another's. Use whichever class is most convenient."

The answers above seems to be false

answered Sep 10 '19 at 14:07

praveen

330
2
8

2

The same source states: `The upload_file method accepts a file name, a bucket name, and an object name. The method handles large files by splitting them into smaller chunks and uploading each chunk in parallel.` and `The upload_fileobj method accepts a readable file-like object. The file object must be opened in binary mode, not text mode.` Above answers state just that. – Laurens Koppenol Jan 10 '20 at 10:42

What is the difference between S3.Client.upload_file() and S3.Client.upload_fileobj()?

4 Answers4

TL;DR

What is fileobj anyway?

Example compression on-the-fly with `s3.upload_fileobj()`

Linked

What is the difference between S3.Client.upload_file() and S3.Client.upload_fileobj()?

4 Answers4

TL;DR

What is fileobj anyway?

Example compression on-the-fly with s3.upload_fileobj()

Linked

Example compression on-the-fly with `s3.upload_fileobj()`