I am trying to stream data from a large csv file into readline. I tried just piping the readStream from s3 into the readline input, however I faced an error with S3 only allowing a connection to stay open for a certain amount of time.
I am creating the stream from s3 like so:
import * as AWS from 'aws-sdk';
import {s3Env} from '../config';
export default async function createAWSStream(): Promise<SmartStream> {
return new Promise((resolve, reject) => {
const params = {
Bucket: s3Env.bucket,
Key: s3Env.key
};
try {
const s3 = new AWS.S3({
accessKeyId: s3Env.accessKey,
secretAccessKey: s3Env.secret
});
s3.headObject(bucketParams, (error, data) => {
if (error) {
throw error
};
const stream = s3.getObject(params).createReadStream();
resolve(stream);
})
} catch (error) {
reject(error);
}
})
}
Then I am piping it into readline:
import * as readline from 'readline';
import createAWSStream from './createAWSStream';
export const readCSVFile = async function(): Promise<void> {
const rStream = await createAWSStream();
const lineReader = readline.createInterface({
input: rStream
});
for await (const line of lineReader) {
// process line
}
}
I found that the timeout for s3 connections was set at 120000ms (2 min). I tried simply raising the timeout, however I ran into more timeout issues from the HTTPS connection.
How can I stream data from AWS S3 the right way without setting a bunch of timeouts to some extremely large timeframe?