27

I have to transfer a file from and API endpoint to two different bucket. The original upload is made using:

curl -X PUT -F "data=@sample" "http://localhost:3000/upload/1/1"

The endpoint where the file is uploaded:

const PassThrough = require('stream').PassThrough;

async function uploadFile (req, res) {
  try {
    const firstS3Stream = new PassThrough();
    const secondS3Stream = new PassThrough();
    req.pipe(firstS3Stream);
    req.pipe(secondS3Stream);

    await Promise.all([
      uploadToFirstS3(firstS3Stream),
      uploadToSecondS3(secondS3Stream),
    ]);
    return res.end();
  } catch (err) {
    console.log(err)
    return res.status(500).send({ error: 'Unexpected error during file upload' });
  }
}

As you can see, I use two PassThrough streams, in order to duplicate the request stream into two readable streams, as suggested in this SO thread.

This piece of code remains unchanged, what is interesting here are the uploadToFirstS3 and uploadToSecondS3 functions. In this minimal example both do exactly the same thing with a different configuration, i will expend only one here.

What Works Well:

const aws = require('aws-sdk');

const s3 = new aws.S3({
  accessKeyId: S3_API_KEY,
  secretAccessKey: S3_API_SECRET,
  region: S3_REGION,
  signatureVersion: 'v4',
});

const uploadToFirstS3 = (stream) => (new Promise((resolve, reject) => {
  const uploadParams = {
    Bucket: S3_BUCKET_NAME,
    Key: 'some-key',
    Body: stream,
  };
  s3.upload(uploadParams, (err) => {
    if (err) reject(err);
    resolve(true);
  });
}));

This piece of code (based on the aws-sdk package) works fine. My issue here is that i want it to run with the @aws-sdk/client-s3 package in order to reduce the size of the project.

What doesn't work:

I first tried to use S3Client.send(PutObjectCommand):

const { S3Client, PutObjectCommand } = require('@aws-sdk/client-s3');

const s3 = new S3Client({
  credentials: {
    accessKeyId: S3_API_KEY,
    secretAccessKey: S3_API_SECRET,
  },
  region: S3_REGION,
  signatureVersion: 'v4',
});

const uploadToFirstS3 = (stream) => (new Promise((resolve, reject) => {
  const uploadParams = {
    Bucket: S3_BUCKET_NAME,
    Key:'some-key',
    Body: stream,
  };
  s3.send(new PutObjectCommand(uploadParams), (err) => {
    if (err) reject(err);
    resolve(true);
  });
}));

Then I tried S3.putObject(PutObjectCommandInput):

const { S3 } = require('@aws-sdk/client-s3');

const s3 = new S3({
  credentials: {
    accessKeyId: S3_API_KEY,
    secretAccessKey: S3_API_SECRET,
  },
  region: S3_REGION,
  signatureVersion: 'v4',
});

const uploadToFirstS3 = (stream) => (new Promise((resolve, reject) => {
  const uploadParams = {
    Bucket: S3_BUCKET_NAME,
    Key:'some-key',
    Body: stream,
  };
  s3.putObject(uploadParams, (err) => {
    if (err) reject(err);
    resolve(true);
  });
}));

The two last examples both give me a 501 - Not Implemented error with the header Transfer-Encoding. I checked req.headers and there is no Transfer-Encoding in it, so I guess the sdk adds in the request to s3 ?

Since the first example (based on aws-sdk) works fine, I'm sure the error is not due to an empty body in the request as suggested in this SO thread.

Still, I thought maybe the stream wasn't readable yet when triggering the upload, thus I wrapped the calls to uploadToFirstS3 and uploadToSecondS3 with a callback triggered by the req.on('readable', callback) event, but nothing changed.

I would like to process the files in memory without storing it on the disk at any time. Is there a way to achieve it using the @aws-sdk/client-s3 package ?

Andy
  • 7,885
  • 5
  • 55
  • 61
Hollyol
  • 827
  • 1
  • 13
  • 25

2 Answers2

51

In v3 you can use the Upload class from @aws-sdk/lib-storage to do multipart uploads. Seems like there might be no mention of this in the docs site for @aws-sdk/client-s3 unfortunately.

It's mentioned in the upgrade guide here: https://github.com/aws/aws-sdk-js-v3/blob/main/UPGRADING.md#s3-multipart-upload

Here's a corrected version of the example provided in https://github.com/aws/aws-sdk-js-v3/tree/main/lib/lib-storage:

  import { Upload } from "@aws-sdk/lib-storage";
  import { S3Client } from "@aws-sdk/client-s3";

  const target = { Bucket, Key, Body };
  try {
    const parallelUploads3 = new Upload({
      client: new S3Client({}),
      tags: [...], // optional tags
      queueSize: 4, // optional concurrency configuration
      leavePartsOnError: false, // optional manually handle dropped parts
      params: target,
    });

    parallelUploads3.on("httpUploadProgress", (progress) => {
      console.log(progress);
    });

    await parallelUploads3.done();
  } catch (e) {
    console.log(e);
  }
Andy
  • 7,885
  • 5
  • 55
  • 61
  • 1
    Hey, thanks for sharing, why `new S3({}) || new S3Client({})`? – Can Rau Feb 11 '22 at 23:49
  • 1
    Good question, I just copied their example verbatim. That's bizarre... I got it to work using `S3Client` in my code so I'll just update the example to use that. – Andy Feb 12 '22 at 01:21
  • yeah I'm also successfully just using S3Client, but hoped you might have an answer to their code – Can Rau Feb 12 '22 at 22:40
  • Nope. It's illogical because `new S3({})` is always truthy. Maybe they were trying to illustrate that you can use either one (not sure if you can?), but that would be a semantically weird way to do so – Andy Feb 14 '22 at 06:46
  • 1
    without the `parallelUploads3.on("httpUploadProgress" ....` line, the steam upload never starts / finishes; How can I start or finish the stream upload without listening to `httpUploadProgress` and without printing the progress? – Kid_Learning_C Aug 31 '22 at 09:34
  • seems strange, I don't recall having that problem myself so I don't know...all I can say is if you're sure you can consistently reproduce that, you could file an issue at https://github.com/aws/aws-sdk-js-v3 – Andy Aug 31 '22 at 21:27
  • Hi @Andy, what's the body should be? Can it be a javascript File Object? Since I want to directly use it in the browser. – huan feng Oct 13 '22 at 06:52
  • @huanfeng that would be worth asking as a separate question, I don't have experience using the AWS SDK in the browser – Andy Jan 20 '23 at 16:38
3

I did come across with the same error that you faced. It seems that they have a known issue that they haven't yet documented accurately:

The error is indeed caused by stream length remaining unknown. We need to improve the error message and the documentation

In order to fix this issue, you just need to specify the Content-length property for PutObjectCommand

Here is the updated snippet:

const { S3 } = require('@aws-sdk/client-s3');

const s3 = new S3({
  credentials: {
    accessKeyId: S3_API_KEY,
    secretAccessKey: S3_API_SECRET,
  },
  region: S3_REGION,
  signatureVersion: 'v4',
});

const uploadToFirstS3 = (passThroughStream) => (new Promise((resolve, reject) => {
  const uploadParams = {
    Bucket: S3_BUCKET_NAME,
    Key:'some-key',
    Body: stream,
    ContentLength: passThroughStream.readableLength, // include this new field!!
  };
  s3.putObject(uploadParams, (err) => {
    if (err) reject(err);
    resolve(true);
  });
}));
      

Hope it helps!