1

I need to read s3 objects in lambda functions as native files without first downloading them. So as soon as a program requests those files from the fs it begins to read them from the bucket but is unaware of that and thinks it's a native file.

My issue is that I'm spawning a program (from binary) which reads all the (several hundred) input URLS synchronously and as a result the accumulation of all the HTTP connection latency is multiplied by the number of files (hundreds) which becomes very significant. If the URLs were to local files I'd save minutes just from the HTTP issue so I'm looking for a solution which would make all the connections asynchronous which then the program can call on-demand without delay.

Perhaps there might be a way to mount a file on the linux fs which consumes from a nodejs stream object? So it's not writing to disk or keeping it in a buffer in-memory but it's available for consumption as a stream.

salmore
  • 135
  • 8
  • https://stackoverflow.com/questions/27299139/read-file-from-aws-s3-bucket-using-node-fs – stdunbar Jan 13 '18 at 21:08
  • I'm aware of the `s3.getObject(params).createReadStream().pipe` I've used this and I'm familiar with streams and pipes. The reason this won't help me is that the program I need to provide the URLs to is not running in node it's a binary I'm spawning with node. Downloading the files won't work for me either as they're too large for the Lambda disk space. Currently I have it set up so that the program reads from from the URLs and pipes its output to STDOUT which I pipe through as a stream to `s3.upload()`. So large files are coming in and out without needing the unavailable disk-space or ram. – salmore Jan 13 '18 at 21:24
  • It's conceivable that you could emulate a local filesystem and bridge the calls to S3 with some kind of streaming structure using something like https://www.npmjs.com/package/fuse-bindings but you'd have to build a lot of scaffolding and fundamentally it still fails because HTTP is the *only* mechanism by which S3 exposes files... `s3.getObject(params).createReadStream()...` is still, at its root, an HTTP request. – Michael - sqlbot Jan 14 '18 at 01:42
  • @Michael-sqlbot Thanks! That can still help me as I can initialize all the http connections asynchronously. – salmore Jan 14 '18 at 08:35

0 Answers0