0

We have explored Elastic File Storage (EFS - somewhat expensive) and synchronized S3 storage across multiple EBS volumes (prone to sync issues), though we have endless issues with various aspects of s3fs (S3 Fuse) at the scale we use it, our apps need the functionality.

We need adequate file uploading performance for 5 (Five) auto-scaled Memory-Optimized EC2 instances (LB-ed web servers) on a VPC that share identical file system structure to "assets" that are managed by end-users and maintained by the application.

A potential solution that we have not explored to leverage our CloudFront CDN configuration for files that could be as large as 2GB in edge cases, or as many as 600 files that vary from 5MB to 100MB each per "page" for users to review/edit. This seems problematic at scale, just as much if not moreso than S3FS is (if its even a viable option at all).

Leaning towards cost efficiency, but still with hardy considerations for performance:

  • is EFS a viable and cost-effective replacement option for S3FS functionality?

  • cost-controls of massive EBS Volumes is a concern, though not ruled out as a viable option, but is the performance going to be worth the added costs?

  • is CloudFront capable of handling such a load as described?

  • is there a better option that we have been missing that we may want to explore?

K8sN0v1c3
  • 109
  • 1
  • 10
  • 1
    Side-note: Instead of S3FS, you could investigate [AWS Storage Gateway – File Gateway](https://aws.amazon.com/storagegateway/file/). It provides a mountable S3 interface. – John Rotenstein Jun 19 '18 at 00:38
  • 1
    It isn't fully clear how your application is really using S3 (5-100 meg files... 600 per "page"... to "edit") but the seemingly obvious solution is to use S3 natively and directly, rather through an abstraction layer like s3fs. The s3fs solution is useful -- I use it myself for my SFTP servers and in fact, you might find some helpful configuration options in [my answer here about it](https://stackoverflow.com/a/23946418/1695906) -- but it is fundamentally a hack, because S3 is an object store, not a filesystem. The best use of S3 is obtained by using it natively, via API/SDK. – Michael - sqlbot Jun 19 '18 at 07:08
  • Thanks @JohnRotenstein - Those are the types of suggestions I was hoping to drum up from this question. Do you have more details about how a sample/hypothetical AWS Storage Gateway configuration "mocks" the feature-set and functionality of s3fs? for example, how do the AWS Storage Gateway "mount" to an EC2? can the same S3 buckets configured for the AWS Storage Gateway - File Gateway be mounted to multiple EC2 instances at the same time? etc etc. There seems to be a lot less info about these details than even s3fs, but perhaps Bing just hasn't revealed the right path to follow (yet)? – K8sN0v1c3 Jun 19 '18 at 11:49
  • @Michael-sqlbot your link to the answer is very very good. In fact, you have touched on options that I have never seen before for use in S3 Fuse, even though I've lived in the VERY weak documentation for s3fs for over a month now, trying to keep an application that is growing wildly in usage running on the very troublesome s3fs mounted file system. I completely agree that s3fs is a "hack" that is very handy and useful in a lot of different use cases, its certainly not a viable option for high traffic throughput, tiered user ACLs, etc. Do you have experience "replacing" s3fs functionality? – K8sN0v1c3 Jun 19 '18 at 11:53

1 Answers1

0

Quite clearly, the best use-case would be Amazon EFS since it is designed as a filesystem that can be mounted across multiple Amazon EC2 volumes, across multiple Availability Zones.

Any solution that uses S3 for storage is not a true filesystem because it merely 'presents' S3 as a filesystem, which it is not. Therefore, there will be some overhead and potential for problems.

If your only issue is price, then hopefully you'll find the performance and seamless usage worthwhile. Any other solution has the additional hidden cost of maintaining an additional system where things can go wrong. In the long-run, a system that works and and simple to maintain is often the best value.

You can try the solutions for yourself (eg File Gateway) and see if it meets your needs.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • That is my personal feeling to the circumstance = EFS is the best solution that is the easiest to maintain. That said, increasing the costs over S3 by 10x the price for maybe 2-3 times the overall performance is a pretty tough sell. We need/want to quickly access any size of file library for any arbitrary user at any given moment from any EC2 instance / VM without breaking the bank. EFS increases existing costs AND also increases future costs at a rate of 10x what S3FS + AWS S3 does for us. We have workarounds in place, but looking for the cheapest options easiest to maintain (how typical!) – K8sN0v1c3 Jun 19 '18 at 13:45
  • 1
    You could always just do a network share from one machine to the others, using the native sharing capabilities of the OS (eg Window SMB or Linux NFS). That's low-cost! – John Rotenstein Jun 19 '18 at 21:49
  • Ha, now are you getting the budget I am working with! That's a great point, though its a bit cumbersome for the ... er... "resources" that we have to work with. But this is actually a great selling point for something like GlusterFS, though the issue of performance might come into play for Multizone setups. Still not a bad idea at all and totally within the scope for possible solutions. :) – K8sN0v1c3 Jun 20 '18 at 15:42