TL;DR
in terms of speed, both methods will perform roughly the same, both written in python and the bottleneck will be either disk-io (read file from disk) or network-io (write to s3).
- use
upload_file()
when writing code that only handles uploading files from disk.
- use
upload_fileobj()
when you writing generic code to handle s3 upload that may be reused in future for not only file from disk usecase.
What is fileobj anyway?
there is convention in multiple places including the python standard library, that when one is using the term fileobj
she means file-like object.
There are even some libraries exposing functions that can take file path (str) or fileobj (file-like object) as the same parameter.
when using file object your code is not limited to disk only, for example:
for example you can copy data from one s3 object into another in streaming fashion (without using disk space or slowing down the process for doing read/write io to disk).
you can (de)compress or decrypt data on the fly when writing objects to S3
example using python gzip module with file-like object in generic way:
import gzip, io
def gzip_greet_file(fileobj):
"""write gzipped hello message to a file"""
with gzip.open(filename=fileobj, mode='wb') as fp:
fp.write(b'hello!')
# using opened file
gzip_greet_file(open('/tmp/a.gz', 'wb'))
# using filename from disk
gzip_greet_file('/tmp/b.gz')
# using io buffer
file = io.BytesIO()
gzip_greet_file(file)
file.seek(0)
print(file.getvalue())
tarfile on the other hand has two parameters file & fileobj:
tarfile.open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)
Example compression on-the-fly with s3.upload_fileobj()
import gzip, boto3
s3 = boto3.resource('s3')
def upload_file(fileobj, bucket, key, compress=False):
if compress:
fileobj = gzip.GzipFile(fileobj=fileobj, mode='rb')
key = key + '.gz'
s3.upload_fileobj(fileobj, bucket, key)