Can I stream a file upload to S3 without a content-length header?

Question

I'm working on a machine with limited memory, and I'd like to upload a dynamically generated (not-from-disk) file in a streaming manner to S3. In other words, I don't know the file size when I start the upload, but I'll know it by the end. Normally a PUT request has a Content-Length header, but perhaps there is a way around this, such as using multipart or chunked content-type.

S3 can support streaming uploads. For example, see here:

http://blog.odonnell.nu/posts/streaming-uploads-s3-python-and-poster/

My question is, can I accomplish the same thing without having to specify the file length at the start of the upload?

The [smart_open](https://github.com/piskvorky/smart_open) Python library does that for you (streamed read and write). — Radim, Jan 26 '15 at 08:43
10 years later & the AWS S3 SDKs *still* don't have a managed way to do this - as someone who is hugely invested in the AWS ecosystem, it's very disappointing to see this in comparison to object management SDKs provided by other cloud providers. This is a core feature missing. — Ermiya Eskandary, Mar 14 '22 at 14:42
@ErmiyaEskandary actually Go SDK has it, but both v1 and v2 have memory leak issues for the multipart upload method (uploader.Upload ) — Nikolay Dimitrov, Aug 31 '23 at 01:21

Marcel Jackwerth · Accepted Answer · 2012-09-28T08:23:56.130

You have to upload your file in 5MiB+ chunks via S3's multipart API. Each of those chunks requires a Content-Length but you can avoid loading huge amounts of data (100MiB+) into memory.

Initiate S3 Multipart Upload.
Gather data into a buffer until that buffer reaches S3's lower chunk-size limit (5MiB). Generate MD5 checksum while building up the buffer.
Upload that buffer as a Part, store the ETag (read the docs on that one).
Once you reach EOF of your data, upload the last chunk (which can be smaller than 5MiB).
Finalize the Multipart Upload.

S3 allows up to 10,000 parts. So by choosing a part-size of 5MiB you will be able to upload dynamic files of up to 50GiB. Should be enough for most use-cases.

However: If you need more, you have to increase your part-size. Either by using a higher part-size (10MiB for example) or by increasing it during the upload.

First 25 parts:   5MiB (total:  125MiB)
Next 25 parts:   10MiB (total:  375MiB)
Next 25 parts:   25MiB (total:    1GiB)
Next 25 parts:   50MiB (total: 2.25GiB)
After that:     100MiB

This will allow you to upload files of up to 1TB (S3's limit for a single file is 5TB right now) without wasting memory unnecessarily.

A note on your link to Sean O'Donnells blog:

His problem is different from yours - he knows and uses the Content-Length before the upload. He wants to improve on this situation: Many libraries handle uploads by loading all data from a file into memory. In pseudo-code that would be something like this:

data = File.read(file_name)
request = new S3::PutFileRequest()
request.setHeader('Content-Length', data.size)
request.setBody(data)
request.send()

His solution does it by getting the Content-Length via the filesystem-API. He then streams the data from disk into the request-stream. In pseudo-code:

upload = new S3::PutFileRequestStream()
upload.writeHeader('Content-Length', File.getSize(file_name))
upload.flushHeader()

input = File.open(file_name, File::READONLY_FLAG)

while (data = input.read())
  input.write(data)
end

upload.flush()
upload.close()

A java implementation of this in the form of an OutputStream exists in s3distcp https://github.com/libin/s3distcp/blob/master/src/main/java/com/amazon/external/elasticmapreduce/s3distcp/MultipartUploadOutputStream.java — sigget, Dec 02 '14 at 23:11
I've created an open source library dedicated to this at https://github.com/alexmojaki/s3-stream-upload — Alex Hall, Oct 22 '15 at 14:13
Looks like you can also use the cli now with pipe - https://github.com/aws/aws-cli/pull/903 — chrismarx, Oct 30 '18 at 15:32
@TusharKolhe googling "python stream multipart upload s3" I found https://stackoverflow.com/questions/31031463/can-you-upload-to-s3-using-a-stream-rather-than-a-local-file and https://stackoverflow.com/questions/52825430/stream-large-string-to-s3-using-boto3 and it looks like there were more results — Alex Hall, May 09 '20 at 10:08
@AlexHall thanx i figured out the way, this is the actual problem that i m trying to solve https://stackoverflow.com/questions/61696155/python-boto3-multipart-upload-video-to-aws-s3. In case of a file already on disk i m able to do this.. but i want to upload streaming frames — Tushar Kolhe, May 09 '20 at 12:24

score 9 · Answer 2 · answered Feb 14 '14 at 16:32

9

Putting this answer here for others in case it helps:

If you don't know the length of the data you are streaming up to S3, you can use S3FileInfo and its OpenWrite() method to write arbitrary data into S3.

var fileInfo = new S3FileInfo(amazonS3Client, "MyBucket", "streamed-file.txt");

using (var outputStream = fileInfo.OpenWrite())
{
    using (var streamWriter = new StreamWriter(outputStream))
    {
        streamWriter.WriteLine("Hello world");
        // You can do as many writes as you want here
    }
}

answered Feb 14 '14 at 16:32

mwrichardson

1,148
1
8
12

2

Is there a Java equivalent of these classes? – Steve K Aug 04 '14 at 22:22
isnt the length of "Hello world" known? does it work if the input is a stream? – at0mzk Jan 18 '17 at 06:28
not supported in dotnet core, since the synchronous nature of Amazon.S3.IO apis, per Microsoft. – xiaochuanQ Jan 17 '20 at 00:20

score 7 · Answer 3 · answered Oct 05 '15 at 13:16

7

You can use the gof3r command-line tool to just stream linux pipes:

$ tar -czf - <my_dir/> | gof3r put --bucket <s3_bucket> --key <s3_object>

answered Oct 05 '15 at 13:16

webwurst

4,830
3
23
32

is there a way to just do `tar -czf - | aws s3 --something-or-other` ? – Aug 01 '19 at 23:07

score 2 · Answer 4 · answered Apr 09 '14 at 21:17

2

If you are using Node.js you can use a plugin like s3-streaming-upload to accomplish this quite easily.

answered Apr 09 '14 at 21:17

nathanpeck

4,608
1
20
18

score 1 · Answer 5 · answered Sep 24 '21 at 03:47

reference to :https://github.com/aws/aws-cli/pull/903

Here is a synopsis: For uploading a stream from stdin to s3, use: aws s3 cp - s3://my-bucket/stream

For downloading an s3 object as a stdout stream, use: aws s3 cp s3://my-bucket/stream -

So for example, if I had the object s3://my-bucket/stream, I could run this command: aws s3 cp s3://my-bucket/stream - | aws s3 cp - s3://my-bucket/new-stream

my cmd:

echo "ccc" | aws --endpoint-url=http://172.22.222.245:80 --no-verify-ssl s3 cp - s3://test-bucket/ccc

yep, this works these days – Nikolay Dimitrov Aug 31 '23 at 01:21 — Nikolay Dimitrov, Aug 31 '23 at 01:21

score 1 · Answer 6 · answered Dec 28 '11 at 13:40

1

Refer more on HTTP multi-part enitity requests. You can send a file as chunks of data to the target.

answered Dec 28 '11 at 13:40

Kris

8,680
4
39
67

Can I stream a file upload to S3 without a content-length header?

6 Answers6

A note on your link to Sean O'Donnells blog:

Linked

Related