2

I am testing the throughput of writing to S3 from a python glue shell job by using the upload_fileobj function from the boto3 client. The input to this function is

Fileobj (a file-like object) -- A file-like object to upload. At a minimum, it must implement the read method, and must return bytes.

In order to have the test isolate just the throughput, as opposed to memory or CPU capabilities, I think the best way to use upload_file_object would be to pass an iterator that produces N bytes of the value 0.

In python, how can a "file like object" be created from an iterator?

I'm looking for something of the form

from itertools import repeat

number_of_bytes = 1024 * 1024

zero_iterator = repeat(b'0', number_of_bytes)

file_like_object = something(zero_iterator) # fill in 'something'

Which would then be passed to boto3 for writing

session.client('s3').upload_fileobj(file_like_object, Bucket='my_bucket')

Thank you in advance for your consideration and response.

Ramón J Romero y Vigil
  • 17,373
  • 7
  • 77
  • 125
  • did you consider using `/dev/null` or `/dev/urandom` as a source? – Marat Jul 13 '20 at 13:33
  • @Marat I need the test to work in windows, linux, & glue shell jobs. therefore I cannot assume the same system resources exist universally. Also, my question is broadly concerned with converting an iterator to a file object. – Ramón J Romero y Vigil Jul 13 '20 at 13:34
  • well, one way is to subclass `io.BytesIO`, replacing `.read` method with your generator. It just seems much more elegant to use system-provided /dev/null if it was available – Marat Jul 13 '20 at 13:36
  • Relevant: https://stackoverflow.com/questions/12593576/adapt-an-iterator-to-behave-like-a-file-like-object-in-python – Vanni Totaro Jul 13 '20 at 15:33

1 Answers1

3

This is a simplified version of the answer at https://stackoverflow.com/a/70547492/1319998, since we only need to deal with bytes, and so should be suitable for boto3's upload_fileobj

def to_file_like_obj(iterable):
    chunk = b''
    offset = 0
    it = iter(iterable)

    def up_to_iter(size):
        nonlocal chunk, offset

        while size:
            if offset == len(chunk):
                try:
                    chunk = next(it)
                except StopIteration:
                    break
                else:
                    offset = 0
            to_yield = min(size, len(chunk) - offset)
            offset = offset + to_yield
            size -= to_yield
            yield chunk[offset - to_yield:offset]

    class FileLikeObj:
        def read(self, size=-1):
            return b''.join(up_to_iter(float('inf') if size is None or size < 0 else size))

    return FileLikeObj()

If you have an iterable that yields bytes, my_iterable say, this can be used with boto3 as follows:

target_obj = boto3.Session().resource('s3').Bucket('my-target-bucket').Object('my/target/key')
target_obj.upload_fileobj(to_file_like_obj(my_iterable)))
Michal Charemza
  • 25,940
  • 14
  • 98
  • 165