Read file from aws s3 bucket using node fs

Question

I am attempting to read a file that is in a aws s3 bucket using

fs.readFile(file, function (err, contents) {
  var myLines = contents.Body.toString().split('\n')
})

I've been able to download and upload a file using the node aws-sdk, but I am at a loss as to how to simply read it and parse the contents.

Here is an example of how I am reading the file from s3:

var s3 = new AWS.S3();
var params = {Bucket: 'myBucket', Key: 'myKey.csv'}
var s3file = s3.getObject(params)

contents.Body.toString() instead of contents.Body – Jason Apr 20 '16 at 00:00 — Jason, Apr 20 '16 at 00:00

dug · Accepted Answer · 2021-07-12T18:06:23.053

120

You have a couple options. You can include a callback as a second argument, which will be invoked with any error message and the object. This example is straight from the AWS documentation:

s3.getObject(params, function(err, data) {
  if (err) console.log(err, err.stack); // an error occurred
  else     console.log(data);           // successful response
});

Alternatively, you can convert the output to a stream. There's also an example in the AWS documentation:

var s3 = new AWS.S3({apiVersion: '2006-03-01'});
var params = {Bucket: 'myBucket', Key: 'myImageFile.jpg'};
var file = require('fs').createWriteStream('/path/to/file.jpg');
s3.getObject(params).createReadStream().pipe(file);

edited Jul 12 '21 at 18:06

answered Jan 16 '15 at 22:09

dug

2,275
1
18
25

What if I also wish to use a Promise for better overall async handling? – verveguy Aug 24 '16 at 03:17
19

@verveguy You can use the following: `new Promise((resolve, reject) => {s3.getObject(params).createReadStream().on('end', () => { return resolve(); }).on('error', (error) => { return reject(error); }).pipe(file)});` – Gustavo Straube Sep 29 '16 at 13:37
2

@verveguy Depending on which version of node you are running, the aws-sdk version > 2.3.0, will use native promises. You can also explicitly configure which promise library you would like to use. `if (typeof Promise === 'undefined') { console.log("Using Bluebird for Promises"); AWS.config.setPromisesDependency(require('bluebird')); }` – alexhb Nov 23 '16 at 19:29
How we can know if pipe() has been ended so that we can do another task on the file after writing it locally...? – Muhammad Usama Mashkoor Jul 15 '18 at 07:45

score 61 · Answer 2 · edited Apr 20 '16 at 01:52

61

This will do it:

new AWS.S3().getObject({ Bucket: this.awsBucketName, Key: keyName }, function(err, data)
{
    if (!err)
        console.log(data.Body.toString());
});

edited Apr 20 '16 at 01:52

Jason

9,408
5
36
36

answered May 12 '15 at 08:59

Lai Xue

1,523
1
15
17

For me, this should be the selected answer. Short and sweet! – Adrian Smith Jul 14 '22 at 09:29

score 28 · Answer 3 · edited Mar 14 '17 at 12:25

28

Since you seem to want to process an S3 text file line-by-line. Here is a Node version that uses the standard readline module and AWS' createReadStream()

const readline = require('readline');

const rl = readline.createInterface({
    input: s3.getObject(params).createReadStream()
});

rl.on('line', function(line) {
    console.log(line);
})
.on('close', function() {
});

edited Mar 14 '17 at 12:25

JSuar

21,056
4
39
83

answered Apr 20 '16 at 00:07

Jason

9,408
5
36
36

1

I think the `end` event is called `close` instead. https://nodejs.org/api/readline.html#readline_event_close – Jonathan Morales Vélez Jan 13 '17 at 20:56
3

If you want to handle gzipped source files, you can use `s3.getObject(params).createReadStream().pipe(zlib.createGunzip())` as InputStream as well... – Tobi Jan 17 '18 at 20:55

score 15 · Answer 4 · edited Jul 22 '21 at 13:42

15

If you are looking to avoid the callbacks you can take advantage of the sdk .promise() function like this:

const s3 = new AWS.S3();
const params = {Bucket: 'myBucket', Key: 'myKey.csv'}
const response = await s3.getObject(params).promise() // await the promise
const fileContent = response.Body.toString('utf-8'); // can also do 'base64' here if desired

I'm sure the other ways mentioned here have their advantages but this works great for me. Sourced from this thread (see the last response from AWS): https://forums.aws.amazon.com/thread.jspa?threadID=116788

edited Jul 22 '21 at 13:42

Alexander Santos

1,458
11
22

answered Mar 29 '20 at 15:39

ryandb

215
4
8

1

What is getObjectResult in the last line? – Felipe Deveza Nov 03 '20 at 20:54
Gold! But indeed, line 4 should be ```const fileContent = response.Body.toString('utf-8');```. – Adam Marsh May 24 '21 at 23:04

score 10 · Answer 5 · answered Sep 27 '16 at 04:42

10

here is the example which i used to retrive and parse json data from s3.

    var params = {Bucket: BUCKET_NAME, Key: KEY_NAME};
    new AWS.S3().getObject(params, function(err, json_data)
    {
      if (!err) {
        var json = JSON.parse(new Buffer(json_data.Body).toString("utf8"));

       // PROCESS JSON DATA
           ......
     }
   });

answered Sep 27 '16 at 04:42

devendra

101
1
3

i think you need to write down how to process the json data as well – Alan Yong Apr 06 '21 at 03:57
after calling JSON.parse in line 5, you'll have a regular js object. If your json is `"{"name": "John", "id": 1}"` on line 8 you can just call `json.name` – Mateusgf Apr 18 '21 at 20:52

Gustavo Straube · Answer 6 · 2017-05-25T10:55:30.100

I couldn't figure why yet, but the createReadStream/pipe approach didn't work for me. I was trying to download a large CSV file (300MB+) and I got duplicated lines. It seemed a random issue. The final file size varied in each attempt to download it.

I ended up using another way, based on AWS JS SDK examples:

var s3 = new AWS.S3();
var params = {Bucket: 'myBucket', Key: 'myImageFile.jpg'};
var file = require('fs').createWriteStream('/path/to/file.jpg');

s3.getObject(params).
    on('httpData', function(chunk) { file.write(chunk); }).
    on('httpDone', function() { file.end(); }).
    send();

This way, it worked like a charm.

score 8 · Answer 7 · answered Aug 06 '18 at 10:10

I prefer Buffer.from(data.Body).toString('utf8'). It supports encoding parameters. With other AWS services (ex. Kinesis Streams) someone may want to replace 'utf8' encoding with 'base64'.

new AWS.S3().getObject(
  { Bucket: this.awsBucketName, Key: keyName }, 
  function(err, data) {
    if (!err) {
      const body = Buffer.from(data.Body).toString('utf8');
      console.log(body);
    }
  }
);

score 7 · Answer 8 · edited May 13 '20 at 12:17

With the new version of sdk, the accepted answer does not work - it does not wait for the object to be downloaded. The following code snippet will help with the new version:

// dependencies

const AWS = require('aws-sdk');

// get reference to S3 client

const s3 = new AWS.S3();

exports.handler = async (event, context, callback) => {

var bucket = "TestBucket"

var key = "TestKey"

   try {

      const params = {
            Bucket: Bucket,
            Key: Key
        };

       var theObject = await s3.getObject(params).promise();

    } catch (error) {
        console.log(error);
        return;
    }  
}

var theObject = await s3.getObject(params).promise() This is the correct way. Thanks — jolly, Aug 14 '20 at 04:51

score 6 · Answer 9 · answered May 25 '17 at 11:01

I had exactly the same issue when downloading from S3 very large files.

The example solution from AWS docs just does not work:

var file = fs.createWriteStream(options.filePath);
        file.on('close', function(){
            if(self.logger) self.logger.info("S3Dataset file download saved to %s", options.filePath );
            return callback(null,done);
        });
        s3.getObject({ Key:  documentKey }).createReadStream().on('error', function(err) {
            if(self.logger) self.logger.error("S3Dataset download error key:%s error:%@", options.fileName, error);
            return callback(error);
        }).pipe(file);

While this solution will work:

    var file = fs.createWriteStream(options.filePath);
    s3.getObject({ Bucket: this._options.s3.Bucket, Key: documentKey })
    .on('error', function(err) {
        if(self.logger) self.logger.error("S3Dataset download error key:%s error:%@", options.fileName, error);
        return callback(error);
    })
    .on('httpData', function(chunk) { file.write(chunk); })
    .on('httpDone', function() { 
        file.end(); 
        if(self.logger) self.logger.info("S3Dataset file download saved to %s", options.filePath );
        return callback(null,done);
    })
    .send();

The createReadStream attempt just does not fire the end, close or error callback for some reason. See here about this.

I'm using that solution also for writing down archives to gzip, since the first one (AWS example) does not work in this case either:

        var gunzip = zlib.createGunzip();
        var file = fs.createWriteStream( options.filePath );

        s3.getObject({ Bucket: this._options.s3.Bucket, Key: documentKey })
        .on('error', function (error) {
            if(self.logger) self.logger.error("%@",error);
            return callback(error);
        })
        .on('httpData', function (chunk) {
            file.write(chunk);
        })
        .on('httpDone', function () {

            file.end();

            if(self.logger) self.logger.info("downloadArchive downloaded %s", options.filePath);

            fs.createReadStream( options.filePath )
            .on('error', (error) => {
                return callback(error);
            })
            .on('end', () => {
                if(self.logger) self.logger.info("downloadArchive unarchived %s", options.fileDest);
                return callback(null, options.fileDest);
            })
            .pipe(gunzip)
            .pipe(fs.createWriteStream(options.fileDest))
        })
        .send();

score 4 · Answer 10 · answered Mar 21 '18 at 09:26

If you want to save memory and want to obtain each row as a json object, then you can use fast-csv to create readstream and can read each row as a json object as follows:

const csv = require('fast-csv');
const AWS = require('aws-sdk');

const credentials = new AWS.Credentials("ACCESSKEY", "SECRETEKEY", "SESSIONTOKEN");
AWS.config.update({
    credentials: credentials, // credentials required for local execution
    region: 'your_region'
});
const dynamoS3Bucket = new AWS.S3();
const stream = dynamoS3Bucket.getObject({ Bucket: 'your_bucket', Key: 'example.csv' }).createReadStream();

var parser = csv.fromStream(stream, { headers: true }).on("data", function (data) {
    parser.pause();  //can pause reading using this at a particular row
    parser.resume(); // to continue reading
    console.log(data);
}).on("end", function () {
    console.log('process finished');
});

score 1 · Answer 11 · answered Aug 22 '20 at 13:53

var fileStream = fs.createWriteStream('/path/to/file.jpg');
var s3Stream = s3.getObject({Bucket: 'myBucket', Key: 'myImageFile.jpg'}).createReadStream();

// Listen for errors returned by the service
s3Stream.on('error', function(err) {
    // NoSuchKey: The specified key does not exist
    console.error(err);
});

s3Stream.pipe(fileStream).on('error', function(err) {
    // capture any errors that occur when writing data to the file
    console.error('File Stream:', err);
}).on('close', function() {
    console.log('Done.');
});

Reference: https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/requests-using-stream-objects.html

Read file from aws s3 bucket using node fs

11 Answers11

Linked