14

I'm trying to use the method upload from s3 using a ReadableStream from the module fs.

The documentation says that a ReadableStream can be used at Bodyparam:

Body — (Buffer, Typed Array, Blob, String, ReadableStream) Object data.

Also the upload method description is:

Uploads an arbitrarily sized buffer, blob, or stream, using intelligent concurrent handling of parts if the payload is large enough.

Also, here: Upload pdf generated to AWS S3 using nodejs aws sdk the @shivendra says he can use a ReadableStream and it works.

This is my code:

const fs = require('fs')
const S3 = require('aws-sdk/clients/s3')

const s3 = new S3()

const send = async () => {
  const rs = fs.createReadStream('/home/osman/Downloads/input.txt')
  rs.on('open', () => {
    console.log('OPEN')
  })
  rs.on('end', () => {
    console.log('END')
  })
  rs.on('close', () => {
    console.log('CLOSE')
  })
  rs.on('data', (chunk) => {
    console.log('DATA: ', chunk)
  })

  console.log('START UPLOAD')

  const response = await s3.upload({
    Bucket: 'test-bucket',
    Key: 'output.txt',
    Body: rs,
  }).promise()

  console.log('response:')
  console.log(response)
}

send().catch(err => { console.log(err) })

It's getting this output:

START UPLOAD
OPEN
DATA: <Buffer 73 6f 6d 65 74 68 69 6e 67>
END
CLOSE
response:
{ ETag: '"d41d8cd98f00b204e9800998ecf8427e"',
  Location: 'https://test-bucket.s3.amazonaws.com/output.txt',
  key: 'output.txt',
  Key: 'output.txt',
  Bucket: 'test-bucket' }

The problem is that my file generated at S3 (output.txt) has 0 Bytes.

Someone know what am I doing wrong?

If I pass a buffer on Body it works.

Body: Buffer.alloc(8 * 1024 * 1024, 'something'), 

But it's not what I want to do. I'd like to do this using a stream to generate a file and pipe a stream to S3 as long as I generate it.

osmanpontes
  • 546
  • 1
  • 3
  • 13

2 Answers2

21

It's an API interface issue using NodeJS ReadableStreams. Just comment the code related to listen event 'data', solves the problem.

const fs = require('fs')
const S3 = require('aws-sdk/clients/s3')

const s3 = new S3()

const send = async () => {
  const rs = fs.createReadStream('/home/osman/Downloads/input.txt')
  rs.on('open', () => {
    console.log('OPEN')
  })
  rs.on('end', () => {
    console.log('END')
  })
  rs.on('close', () => {
    console.log('CLOSE')
  })
  // rs.on('data', (chunk) => {
  //   console.log('DATA: ', chunk)
  // })

  console.log('START UPLOAD')

  const response = await s3.upload({
    Bucket: 'test-bucket',
    Key: 'output.txt',
    Body: rs,
  }).promise()

  console.log('response:')
  console.log(response)
}

send().catch(err => { console.log(err) })

Though it's an strange API, when we listen to 'data' event, the ReadableStream starts the flowing mode (listening to an event changing publisher/EventEmitter state? Yes, very error prone...). For some reason the S3 need a paused ReadableStream. If whe put rs.on('data'...) after await s3.upload(...) it works. If we put rs.pause() after rs.on('data'...) and befote await s3.upload(...), it works too.

Now, what does it happen? I don't know yet...

But the problem was solved, even it isn't completely explained.

osmanpontes
  • 546
  • 1
  • 3
  • 13
0
  1. Check if file /home/osman/Downloads/input.txt actually exists and accessible by node.js process
  2. Consider to use putObject method

Example:

const fs = require('fs');
const S3 = require('aws-sdk/clients/s3');

const s3 = new S3();

s3.putObject({
  Bucket: 'test-bucket',
  Key: 'output.txt',
  Body: fs.createReadStream('/home/osman/Downloads/input.txt'),
}, (err, response) => {
  if (err) {
    throw err;
  }
  console.log('response:')
  console.log(response)
});

Not sure how this will work with async .. await, better to make upload to AWS:S3 work first, then change the flow.


UPDATE: Try to implement upload directly via ManagedUpload

const fs = require('fs');
const S3 = require('aws-sdk/clients/s3');

const s3 = new S3();

const upload = new S3.ManagedUpload({
  service: s3,
  params: {
    Bucket: 'test-bucket',
    Key: 'output.txt',
    Body: fs.createReadStream('/home/osman/Downloads/input.txt')
  }
});

upload.send((err, response) => {
  if (err) {
    throw err;
  }
  console.log('response:')
  console.log(response)
});
dr.dimitru
  • 2,645
  • 1
  • 27
  • 36
  • The file exists and is accessible. The line `DATA: ` says that it was read. `putObject` sends the data on once HTTP request, it doesn't stream it to S3. `async .. await` is not a problem here. Thank you! – osmanpontes May 23 '17 at 23:52
  • @osmanpontes wouldn't argue on this, you're right. Have you tried code I've suggested changing `putObject` to `upload` method? – dr.dimitru May 24 '17 at 00:03
  • @osmanpontes anyways, I'm suggesting to try `putObject` in testing purpose – dr.dimitru May 24 '17 at 00:04
  • 1
    I tried this and it worked as expected. ;]. Do you have some more insight? – osmanpontes May 24 '17 at 00:36
  • `.upload()` method was initially created for browser. This might be a reason for misbehavior, either it should work on node.js - we're always use `.putObject()` method. It has only one recommendation - 5GB max file size. And I haven't found an evidence in SDK sources what file is sent as single HTTP request, [looks like `.putObject()` uses `.upload()`](https://github.com/aws/aws-sdk-js/blob/05b12ab01a319643a6320879991f224d1fc93f0e/lib/services/s3.js#L989) as underlying code. I recommend to take a look on [source code](https://github.com/aws/aws-sdk-js/search?p=1&q=putObject&type=&utf8=✓) – dr.dimitru May 24 '17 at 00:50
  • Where did you check `.upload` was created for browser? The problem with `.putObject` is that I need to generate files bigger than 5GB. I messed up my mind saying `.putObject` use one single request. The reason was that it doesn't fit to me, as you mentioned about max file size. It's a little strange that `.putObject` uses `.upload`, once the first has a 5GB as limit and [the other doesn't](https://github.com/aws/aws-sdk-js/blob/05b12ab01a319643a6320879991f224d1fc93f0e/lib/services/s3.js#L998). – osmanpontes May 24 '17 at 01:22
  • @osmanpontes sorry can't find info in the [changelog](https://github.com/aws/aws-sdk-js/blob/master/CHANGELOG.md) earlier than 2.4.8. So, you may don't mind it was initially made for browser, as it anyways should work for node.js according to the documentation. Try to implement upload directly via [`AWS.S3.ManagedUpload`](http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3/ManagedUpload.html#constructor-property), see my updated answer – dr.dimitru May 24 '17 at 02:59
  • Thanks a lot for your help, but I found the problem. It's very tricky. I'm gonna write the answer now. – osmanpontes May 24 '17 at 14:15