12

Consider our current architecture:

         +---------------+                             
         |    Clients    |                             
         |    (API)      |                             
         +-------+-------+                             
                 ∧                                     
                 ∨                                     
         +-------+-------+    +-----------------------+
         | Load Balancer |    |   Nginx               |
         | (AWS - ELB)   +<-->+   (Service Routing)   |
         +---------------+    +-----------------------+
                                          ∧            
                                          ∨            
                              +-----------------------+
                              |   Nginx               |
                              |   (Backend layer)     |
                              +-----------+-----------+
                                          ∧            
                                          ∨            
         -----------------    +-----------+-----------+
           File Storage       |       Gunicorn        |
           (AWS - S3)     <-->+       (Django)        |
         -----------------    +-----------------------+

When a client, mobile or web, try to upload large files (more than a GB) on our servers then often face idle connection timeouts. Either from their client library, on iOS for example, or from our load balancer.

When the file is actually being uploaded by the client, no timeouts occurs because the connection isn't "idle", bytes are being transferred. But I think when the file has been transferred into the Nginx backend layer and Django starts uploading the file to S3, the connection between the client and our server becomes idle until the upload is completed.

Is there a way to prevent this from happening and on which layer should I tackle this issue ?

Laurent Jalbert Simard
  • 5,949
  • 1
  • 28
  • 36
  • Did you set client_max_body_size in NGINX conf? – Zulfugar Ismayilzadeh Sep 21 '16 at 21:31
  • What system is firing the timeout? ELB or something else? ELB defaults to 60s but it's configurable. – Michael - sqlbot Sep 22 '16 at 02:24
  • In this case, it's the client that is timeouting – Marc-Alexandre Bérubé Sep 22 '16 at 13:03
  • Can you list all the timeout related settings you already adjusted at all levels? – serg Sep 23 '16 at 02:49
  • @Michael-sqlbot I've already increased that value to 20 minutes, but I thinks it's hackish since the action of waiting for a large file to get uploaded to S3 from our server should not be considered as "idle". Moreover, I can't control idle timeouts on the client side so I this wouldn't solve the issue entirely. Thanks – Laurent Jalbert Simard Sep 23 '16 at 13:54
  • @serg I have set a 20 minutes idle connection timeout accros all levels which allow 99% of uploads from a web browser to go through. However I don't think increasing this timeout for the 1% remainder is the proper way to solve this. And as I just wrote above, I can't control the idle connection timeouts of the devices that upload large files to our service. Thanks for helping out. – Laurent Jalbert Simard Sep 23 '16 at 14:01
  • @ZulfugarIsmayilzadeh thanks for reminding me of this one :) it was set to "only" 2GB. However, I can have idle connection timeout from a tablet when uploading a 1,2GB file so, sadly, this isn't the issue here. – Laurent Jalbert Simard Sep 23 '16 at 14:12
  • You're reaching the limit of HTTP. Maybe you should upload the file from django to AWS S3 asynchronously and then push a notification to the client with a websocket. Or pull from the client every X seconds to check if the upload is done if you want to avoid the burden of websockets. – Antoine Fontaine Sep 26 '16 at 14:30
  • @AntoineFontaine I thought about it, but I'm running multiple stateless web servers, so once I start polling, I won't hit the web server doing the S3 upload every time. If I only poll S3 to see if the file exists I won't have any way to check if the upload to S3 failed and I'll be waiting forever. All and all, it's still a better solution than what I have now. So I'll consider it if no one comes up with something cleaner. Merci ! – Laurent Jalbert Simard Sep 26 '16 at 14:59
  • Maybe you can take a look at Channels, it's now an official django package to manage asynchronous tasks and especially websockets. It needs some configuration but after it's quite easy to handle your problem. – Antoine Fontaine Sep 26 '16 at 15:44
  • @AntoineFontaine Wow this looks very promising! As of now, it does not play well with Django Rest Framework which we rely on, but they say it's in the work. So I'm really looking forward to this. Thanks for pointing it out ! – Laurent Jalbert Simard Sep 26 '16 at 19:58
  • I'm not confident I totally understand where the issue lies, but I'll take a shot. I had a similar problem with uploading large files received by a Django app. My bottleneck was exhausting memory from reading too many large files into it. I solved that with multipart uploads to S3 (http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html). This discussion on streaming uploads with boto3 might also help (https://github.com/boto/boto3/issues/256). – Taylor D. Edmiston Sep 29 '16 at 23:46
  • While upload to webserver and upload to S3 are two parts of a singe HTTP request/response cycle the client and webserver are both locked and dependent on client bandwidth and S3 bandwidth. Have you considered another approach described here: https://stackoverflow.com/questions/44371643/nginx-php-failing-with-large-file-uploads-over-6-gb/44751210#44751210 – Anatoly Jun 28 '17 at 09:37

3 Answers3

3

I have faced the same issue and fixed it by using django-queued-storage on top of django-storages. What django queued storage does is that when a file is received it creates a celery task to upload it to the remote storage such as S3 and in mean time if file is accessed by anyone and it is not yet available on S3 it serves it from local file system. In this way you don't have to wait for the file to be uploaded to S3 in order to send a response back to the client.

As your application behind Load Balancer you might want to use shared file system such as Amazon EFS in order to use the above approach.

Aamir Rind
  • 38,793
  • 23
  • 126
  • 164
1

You can try to skip uploading the file to your server and upload it to s3 directly, then only get back an url for your application.

There is an app for that: django-s3direct you can give it a try.

Todor
  • 15,307
  • 5
  • 55
  • 62
1

You can create an upload handler to upload file directly to s3. In this way you shouldn't encounter connection timeout.

https://docs.djangoproject.com/en/1.10/ref/files/uploads/#writing-custom-upload-handlers

I did some tests and it works perfectly in my case.

You have to start a new multipart_upload with boto for example and send chunks progressively.

Don't forget to validate the chunk size. 5Mb is the minimum if your file contains more than 1 part. (S3 Limitation)

I think this is the best alternative to django-queued-storage if you really want to upload directly to s3 and avoid connection timeout.

You'll probably also need to create your own filefield to manage file correctly and not send it a second time.

The following example is with S3BotoStorage.

S3_MINIMUM_PART_SIZE = 5242880


class S3FileUploadHandler(FileUploadHandler):
    chunk_size = setting('S3_FILE_UPLOAD_HANDLER_BUFFER_SIZE', S3_MINIMUM_PART_SIZE)

    def __init__(self, request=None):
        super(S3FileUploadHandler, self).__init__(request)
        self.file = None
        self.part_num = 1
        self.last_chunk = None
        self.multipart_upload = None

    def new_file(self, field_name, file_name, content_type, content_length, charset=None, content_type_extra=None):
        super(S3FileUploadHandler, self).new_file(field_name, file_name, content_type, content_length, charset, content_type_extra)
        self.file_name = "{}_{}".format(uuid.uuid4(), file_name)

        default_storage.bucket.new_key(self.file_name)

        self.multipart_upload = default_storage.bucket.initiate_multipart_upload(self.file_name)

    def receive_data_chunk(self, raw_data, start):
        buffer_size = sys.getsizeof(raw_data)

        if self.last_chunk:
            file_part = self.last_chunk

            if buffer_size < S3_MINIMUM_PART_SIZE:
                file_part += raw_data
                self.last_chunk = None
            else:
                self.last_chunk = raw_data

            self.upload_part(part=file_part)
        else:
            self.last_chunk = raw_data

    def upload_part(self, part):
        self.multipart_upload.upload_part_from_file(
            fp=StringIO(part),
            part_num=self.part_num,
            size=sys.getsizeof(part)
        )
        self.part_num += 1

    def file_complete(self, file_size):
        if self.last_chunk:
            self.upload_part(part=self.last_chunk)

        self.multipart_upload.complete_upload()
        self.file = default_storage.open(self.file_name)
        self.file.original_filename = self.original_filename

        return self.file