3

Folks, I've setup an SFTP server on an EC2 instance to receive files from remote customers that need to send 3 files each, several times throughout the day (each customer connects multiple times a day, each time transferring the 3 files which keep their names but change their contents). This works fine if the number of customers connecting simultaneously is kept under control, however I cannot control exactly when each customer will connect (they have automated the connection process at their end). I am anticipating that I may reach a bottleneck in case too many people try to upload files at the same time, and have been looking for alternatives to the whole process ("distributed file transfer" of some sort). That's when I stumbled upon AWS S3, which is distributed by definition, and was wondering if I could do something like the following:

  • Create a bucket called "incoming-files"
  • Create several folders inside this bucket, one for each customer
  • Setup a file transfer mechanism (I believe I'd have to use S3's SDK somehow)
  • Provide a client application for each customer, so that they can run it at their side to upload the files to their specific folders inside the bucket

This last point is easy on SFTP, since you can set a "root" folder for each user so that when the user connects to the server it automatically lands on its appropriate folder. Not sure if something of this sort can be worked out on S3. Also the file transfer mechanism would have to not only provide credentials to access the bucket, but also "sub-credentials" to access the folder.

I have been digging into S3 but couldn't quite figure out if this whole idea is (a) feasible and (b) practical. The other limitation with my original SFTP solution is that by definition an SFTP server is a single point of failure, which I'd be glad to avoid. I'd be thrilled if someone could shed some light on this (btw, other solutions are also welcomed).

Note that I am trying to eliminate the SFTP server altogether, and not mount an S3 bucket as the "root folder" for the SFTP server.

Thank you

Marcio Buss
  • 71
  • 2
  • 5
  • 2
    Possible duplicate of [FTP/SFTP access to an Amazon S3 Bucket](http://stackoverflow.com/questions/23939179/ftp-sftp-access-to-an-amazon-s3-bucket) – Hackerman Dec 27 '16 at 15:38
  • 1
    Hello Hackerman. It is not a duplicate question in that sense because I am not trying to mount an S3 bucket as the root folder for an sftp server. I am trying to eliminate the sftp server altogether, i.e., having users to upload files directly into S3 "folders" inside a bucket. Thanks! – Marcio Buss Dec 27 '16 at 17:03
  • Hi. Please select an answer for the question if you feel one of them has met your needs. – rumdrums Dec 31 '16 at 01:12

3 Answers3

1

You can create an S3 policy that will grant access only to certain prefix ("folder" in your plan). The only thing your customers need is permission to do PUT request. For each customer you will also need to create a set of access keys.

It seems you're overcomplicating. If SFTP is a bottleneck and is not redundant, you can always create a scale group (with ELB or DNS round-robin in front of it) and mount S3 to EC2 instances with sshfs or goofys. If cost is not an issue here, you can even mount EFS as NFS share.

Sergey Kovalev
  • 9,110
  • 2
  • 28
  • 32
  • Thanks Sergey. As I mentioned in my comments to rumdrums, I am trying at all costs to minimize maintenance overheads in the future as well as time to setup the whole environment. I also need some mechanism to process the incoming files upon arrival, which I believe is something inherent on S3's model (I am currently listening to the file system and invoking stored procedures on a database to import the incoming files, which also has limitations on the number of simultaneous file system events). – Marcio Buss Dec 27 '16 at 17:13
  • If your clients are smart enough to use S3 API or AWS CLI tools, S3 + Lambda for processing is the way to go. But if you'll need to create custom software for them, that definitely wouldn't be maintenance-free solution. In this case, I would mount S3 as a filesystem for uploading and create S3 events that would fire Lambda functions. Since automatic processing is required and you don't want to watch filesystem events, EFS is out the question. – Sergey Kovalev Dec 27 '16 at 17:25
1

AWS has an example configuration here that seems like it may meet your needs pretty well.

I think you're definitely right to consider s3 over a traditional SFTP setup. If you do go with a server-based approach, I agree with Sergey's answer -- an auto-scaling group of servers backed by shared EFS storage. You will, of course, have to own maintenance of those servers, which may or may not be an issue depending on your expertise and desire to do so.

A pure s3 solution, however, will almost certainly be cheaper and require less maintenance in the long-run.

rumdrums
  • 1,322
  • 2
  • 11
  • 25
  • Thanks rumdrums. You are right, if I can find a solution that minimizes maintenance in the long run, that would be highly preferable (both due to lack of expertise and cannot afford the time to do so). Also because I need to process the incoming csv file upon arrival (which I do today via listening to the file system and importing into a database). With an S3 solution I believe I could trigger SQS events and run, eg, a Data Pipeline or even a lambda function that would process the file. Have you stumbled upon any client application that I can quickly adapt and provide to the users? Tks! – Marcio Buss Dec 27 '16 at 16:59
  • If your clients are fairly technical, I would think they would be happy automating their own process using aws cli or one of the SDKs available and -- they would only need a set of API keys provided by you in order to enable that. While I don't know for sure, I believe that there are freely available SFTP clients that support connecting to s3. Also, I don't know of one offhand, but I have seen projects on github that provide a pure client-side web interface for uploading to s3 -- you could "easily" (give or take) use such a project and serve it out using a web-enabled s3 bucket. – rumdrums Dec 27 '16 at 17:21
  • Agreed with the above. Create an IAM user for each customer, with permissions to only upload to a specific path within the Amazon S3 bucket. have them use the [AWS Command-Line Interface (CLI)](http://aws.amazon.com/cli/) to upload files -- give them a script to run it. You could use `aws s3 sync` to automatically sync files from a local directory to the S3 bucket. If you're scaling to dozens of users, then your application could generate temporary credentials using the AWS Security Token Service, rather than creating individual IAM Users. – John Rotenstein Dec 28 '16 at 04:27
0

There is now an AWS managed SFTP service in the AWS Transfer family.

https://aws.amazon.com/blogs/aws/new-aws-transfer-for-sftp-fully-managed-sftp-service-for-amazon-s3/

Today we are launching AWS Transfer for SFTP, a fully-managed, highly-available SFTP service. You simply create a server, set up user accounts, and associate the server with one or more Amazon Simple Storage Service (S3) buckets. You have fine-grained control over user identity, permissions, and keys. You can create users within Transfer for SFTP, or you can make use of an existing identity provider. You can also use IAM policies to control the level of access granted to each user. You can also make use of your existing DNS name and SSH public keys, making it easy for you to migrate to Transfer for SFTP. Your customers and your partners will continue to connect and to make transfers as usual, with no changes to their existing workflows.

Coin Graham
  • 1,343
  • 3
  • 11