8

I'm using S3.Client.upload_fileobj() with a BytesIO stream as input to upload a file to S3 from a stream. My function should not return before the upload is finished, so I need a way to wait it.

From the documentation there is no obvious way to wait for the transfer to finish, but there are some hints of what could work:

  1. Use the callback arg to wait until progress is at 100%. In Javascript this would be trivial using callbacks or promises, but in Python I'm not so sure.
  2. Use a S3.Waiter object that checks if the object exists. But it does so by polling every 5s and seems very ineffective. Also I'm not sure if it would wait until the object is complete.
  3. There's a class S3.MultipartUpload with a .complete() method, but I doubt that does what I want.
  4. Do a loop that checks if the object is completely uploaded and if not, sleeps for a bit. But how do I check if the object is complete?

I've been googling but it seems nobody is asking the same question. Also, most results talking about related issues are using a different API (I believe upload_fileobj() is rather new).

EDIT If found out about S3.Client.put_object which also accepts a file-like object and blocks until the server responded. But would that work in combination with streams? I'm not sure how Python multithreading works here. The stream comes originally from a S3.Client.download_fileobj(), gets piped through a subprocess.Popen() and is then supposed to get uploaded back to S3. Both the download and the subprocess run in parallel threads/processes as fas as I can tell.

cpury
  • 1,003
  • 10
  • 18

1 Answers1

5

upload_file/upload_fileobj methods take care of the things you're looking for (i.e they wait for completion of object/file uploading).

I don't suggest 1st or 4th options. There's no need to use s3 waiter either, as upload_file/upload_fileobj methods returns only after uploading job is done.

Note that upload_file/upload_fileobj methods will automatically handle reading/writing files as well as doing multipart uploads in parallel for large files so there's no need to use multipart upload irrespective of file size.

Venkatesh Wadawadagi
  • 2,793
  • 21
  • 34
  • 2
    Thanks, Venkatesh. So, your reply did not actually answer my question directly, but it still helped. Judging from your code, I saw that you assume that the upload is finished when the method returns. I was doubtful of this at first, but then checked the source code. It turns out `upload_fileobj` actually gets a `future` from the transfer manager and then waits for its completion, so you are right! Once the method returns, the upload is done. This helps a lot! If you can edit your answer to be more concise and explicit, I will accept it. Thanks! – cpury Feb 22 '17 at 15:21
  • @cpury Glad to know that my answer helped you in a way! I'm happy that you found out/realised what I intended to convey. I edited my answer to be more concise and explicit. I hope you can accept it now. P.S: Only reason to post the ready made/example code was that 'it will clarify the doubts for you by trying out and you can see the results yourself' though posting the entire code was not necessary. – Venkatesh Wadawadagi Feb 23 '17 at 11:55