I have the following use case to solve. I need to ingest data from a S3 bucket using a Lambda function (NodeJS 12). The Lambda function will be triggered when a new file is created. The file is a gz archive and can contain multiple TSV (tab-separated) files. For each row an API call will be triggered from the Lambda function. Questions:
1 - Does it have to be a two-steps process: uncompress the archive in a /tmp folder and then read the TSV files. Or can you directly stream the content of the archive file?
2 - Do you have a snippet of code that you could share that shows how to stream a GZ file from S3 bucket and its content (TSV)? I've found few examples but only for pure NodeJS. Not from Lambda/S3.
Thanks a lot for your help.
Adding a snippet of code for my first test and it doesnt work. No data is logged in the console
const csv = require('csv-parser')
const aws = require('aws-sdk');
const s3 = new aws.S3();
exports.handler = async(event, context, callback) => {
const bucket = event.Records[0].s3.bucket.name;
const objectKey = event.Records[0].s3.object.key;
const params = { Bucket: bucket, Key: objectKey };
var results = [];
console.log("My File: "+objectKey+"\n")
console.log("My Bucket: "+bucket+"\n")
var otherOptions = {
columns: true,
auto_parse: true,
escape: '\\',
trim: true,
};
s3.getObject(params).createReadStream()
.pipe(csv({ separator: '|' }))
.on('data', (data) => results.push(data))
.on('end', () => {
console.log("My data: "+results);
});
return await results
};