0

I have been working on a large download. My requirement is to read through 100k+ files (in gzip JSON format) on S3 using S3 Select to filter and stream the data in a downloaded format to the client.

I have written 2 services:

  1. Client interaction(Controller)
  2. S3 interaction (S3 Interactor)

When the client hits on the download button, the controller calls S3 Interactor for data, but after a few mins, the connection between services breaks. I am not sure how to keep the connection alive for, say, 30 minutes because the data can be in TBs.

Anonymous Coward
  • 1,096
  • 11
  • 22
  • had a similar problem where the connection would time out. I ended up grabbing byte ranges in consecutive requests instead trying to dump the entire file - [this might help you](https://stackoverflow.com/questions/70625366/streaming-files-from-aws-s3-with-nodejs) – about14sheep Apr 09 '23 at 02:20
  • Are all the files in the same format? I wonder whether Amazon Athena would be a suitable alternative to S3 Select, since it can scan multiple files simultaneously and run SQL across them. However, 100k+ files might be too much for Athena. – John Rotenstein Apr 09 '23 at 08:31
  • @JohnRotenstein files are in the same format. I tried with Athena, and it works fine most of the time with 20k files but breaks when reading around 40k. That is why, I was going via the standard approach. – Taslim Arif Apr 10 '23 at 03:57
  • @about14sheep I think, S3 Select does not support the byte range functionality for gzip JSON formatted data. – Taslim Arif Apr 10 '23 at 04:05

0 Answers0