-1

I am trying to extract multiple files from AWS S3 bucket and willing to merge the response from all files after.

E.g I have following files:

my-bucket/mainfile1.json.gz
my-bucket/mainfile2.json.gz
my-bucket/mainfile3.json.gz

Currently I am accessing a single file like this:

const unzipFromS3 = (key, bucket) => {
  
  return new Promise(async (resolve, reject) => {

    AWS.config.loadFromPath(process.env["PWD"]+'/private/awss3/s3_config.json'); 
    var s3 = new AWS.S3();

    let options = {
      'Bucket': "my-bucket",
      'Key':    "mainfile1.json.gz",
    };

    s3.getObject(options, function(err, res) {
      if(err) return reject(err);
      
      resolve(zlib.unzipSync(res.Body).toString());
    });
  });
};

unzipFromS3().then(function(result){

  console.dir(result);
});

Now this works perfect for single file, but how can I achieve this with multiple files in case I want to merge data from 3 separate files?

StormTrooper
  • 1,731
  • 4
  • 23
  • 37
  • What's preventing you doing this multiple times and then, once all complete, concatenating or otherwise merging the resulting gz files? – jarmod Nov 18 '21 at 20:09
  • I am unable to find anything in aws docs with multiple keys – StormTrooper Nov 19 '21 at 06:47
  • There is no SDK function to get multiple objects in the same call. You would simply make multiple downloads, one per object. When [all complete](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/all), move on to your merge. – jarmod Nov 19 '21 at 13:14
  • @jarmod I am really not good with javascript, can you please post an answer with Promise.all functionality? – StormTrooper Nov 19 '21 at 15:06

1 Answers1

1

Here's an initial idea of how to read the gzipped JSON files from S3, unzip them, then merge the resulting JavaScript objects, and finally gzip and write the merged results back to S3.

const aws = require('aws-sdk');
const zlib = require('zlib');
const s3 = new aws.S3();

const BUCKET = 'mybucket';
const PREFIX = '';
const FILES = ['test1.json.gz', 'test2.json.gz', 'test3.json.gz'];

(async () => {
  const promises = [];

  try {
    for (let ii = 0; ii < FILES.length; ii++) {
      const params = {
        Bucket: BUCKET,
        Key: `${PREFIX}${FILES[ii]}`,
      };
      console.log('Get:', params.Key, 'from:', params.Bucket);
      promises.push(s3.getObject(params).promise());
    }

    const results = await Promise.all(promises);
    const buffers = results.map(result => result.Body);
    const content = buffers.map(buffer => JSON.parse(zlib.unzipSync(buffer).toString()));
    console.log('Read OK', JSON.stringify(content));

    const merged = Object.assign({}, ...content);
    console.log('Merged content', JSON.stringify(merged));

    const params = {
      Bucket: BUCKET,
      Key: `${PREFIX}result/test.json.gz`,
      Body: zlib.gzipSync(JSON.stringify(merged), 'utf8'),
    };

    console.log('Put:', params.Key, 'to:', params.Bucket);
    const rc = await s3.putObject(params).promise()
  } catch (err) {
    console.log(err, err.stack);
    throw err;
  }
})();
jarmod
  • 71,565
  • 16
  • 115
  • 122
  • If the file is not found it crashes on this line: const results = await Promise.all(promises); – StormTrooper Dec 06 '21 at 18:35
  • Yes, you would need to add [error handling](https://stackoverflow.com/questions/30362733/handling-errors-in-promise-all) in case a promise is rejected. – jarmod Dec 06 '21 at 18:39
  • Can you please edit your answer and place that error handler aswell. That would be great – StormTrooper Dec 06 '21 at 19:20
  • The code does actually already have an exception handler. It prints the exception with a stack trace and then re-throws the exception. You could change this handler to do whatever custom error handling you need to. – jarmod Dec 06 '21 at 20:19
  • actually I dont want it to be crash, I have files in array, so if some file is not there i want to skip that, in current scenario it breaks the script – StormTrooper Dec 06 '21 at 20:44
  • Then you can use `Promise.allSettled(promises)` ([docs](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/allSettled)). You will then need to filter out the failed downloads (they'll have a `reason`), but that's easy enough. – jarmod Dec 06 '21 at 20:48