1

I am having a media based web application running on AWS (EC2 windows). And I'm trying to achieve scalability by adding the app and web servers on an auto scaling group.

My problem is I need to separate the media storage to S3 so that I can share this with different app server clusters. But I have to move these media files from S3 to different FTP servers. For that I have to download the files from S3 to app server and then do the FTP upload which is taking too much time process. Note that I am using ColdFusion as application server.

Now I have 2 options to solve this

  1. Mount the S3 instance to EC2 instances (I know that is not recommenced, also not sure if that will help to improve the speed of FTP upload).
  2. Use Lamda service to upload files directly from S3 to FTP servers

I can not use separate EBS volume to each of the EC2 instance because

  1. The storage volume is huge and it will result in high cost
  2. I need to sync the media storage on different EBS volumes attached to the EC2 instances

EFS is not an option as I'm using windows storage.

Can any one suggest better solution?

Lajin
  • 91
  • 9
  • 1
    FWIW, none of this has anything to do with ColdFusion from what I can tell. You might ask on over on Server Fault regarding your storage issues. – Adrian J. Moreno Feb 13 '19 at 23:57
  • AWS announced something a few months back that lets you use S3 in place of a FTP server. Have you looked into that? https://aws.amazon.com/blogs/aws/new-aws-transfer-for-sftp-fully-managed-sftp-service-for-amazon-s3/ – Matthew Pope Feb 16 '19 at 21:38

1 Answers1

1

That is pretty easy with python

from ftplib import FTP
from socket import _GLOBAL_DEFAULT_TIMEOUT
import urllib.request

class FtpCopier(FTP):

    source_address = None
    timeout = _GLOBAL_DEFAULT_TIMEOUT

    # host → ftp host name / ip
    # user → ftp login user
    # password → ftp password
    # port → ftp port
    # encoding → ftp servver encoding
    def __init__(self, host, user, password, port = 21, encoding = 'utf-8'):

        self.host = host
        self.user = user
        self.password = password
        self.port = port
        self.connect(self.host, self.port)
        self.login(self.user, self.password, '')
        self.encoding = encoding

    # url → any web URL (for example S3)
    # to_path → ftp server full path (check if ftp destination folders exists)
    # chunk_size_mb → data read chunck size
    def transfer(self, url, to_path, chunk_size_mb = 10):

        chunk_size_mb = chunk_size_mb * 1048576 # 1024*1024
        file_handle = urllib.request.urlopen(url)
        self.storbinary("STOR %s" % to_path, file_handle, chunk_size_mb)

Use example:

ftp = FtpCopier("some_host.com", "user", "p@ssw0rd")
ftp.transfer("https://bucket.s3.ap-northeast-2.amazonaws.com/path/file.jpg", "/path/new_file.jpg")

But remember that lambda process time is limited to 15 minutes. So timeout may appear before file transfer completed. I recommend to use ECS Fargate instead lambda. That allows to hold running process as long as you want.

If S3 file is not public, use presigned URLs to access it via urllib.

aws s3 presign s3://bucket/path/file.jpg --expires-in 604800
rzlvmp
  • 7,512
  • 5
  • 16
  • 45
  • Hey there and thanks for your answer. Can you elaborate a little bit on what you are doing with `timeout = _GLOBAL_DEFAULT_TIMEOUT` ? – daudprobst May 04 '21 at 09:45
  • As you can see in the parent `FTP` class https://github.com/python/cpython/blob/main/Lib/ftplib.py , timeout set when constructor is called. Mine `__init__` method don't has a timeout parameter, so I need to set it manually inside child `FtpCopier` class. Actually I don't remember why I overridden constructor. Looks like the standard one doing same stuff. – rzlvmp May 05 '21 at 04:11
  • Makes sense! Thanks for your answer. – daudprobst May 05 '21 at 10:24