1

I have an application that uploads files in the background while the main task is being created, similar to Gmail while you are composing an email.

User uploads a file, the backend parses it to see if it has any problems etc.

And after they create their task through the app, it takes the files in session (location in /tmp) and uploads them to Amazon Simple Storage Service (AWS S3).

It happens that on a server with load balancing, each file can end up on a different server (different /tmp as each server has its own root).

Which causes my problem locating the file in /tmp (since it doesn't exist).

In order not to have this problem, I ended up using sticky sessions on the load balancer, with the downside that the balancer is now underutilized, which is not very useful for me.

What's the best way to handle file uploads in this case?

hakre
  • 193,403
  • 52
  • 435
  • 836
Tom
  • 641
  • 2
  • 8
  • 21
  • Why not write it directly into the S3 instance rather than the local server instance? If validation fails, handle overwriting them files when new files are uploaded or delete them if cancelled. Not entirely sure why you'd want to store them on the physical server during the intermediate stage of the processing - your issue is that you're creating stateful processes. – Jaquarh Aug 12 '23 at 20:24
  • @Jaquarh I don't do it that way because the task can be cancelled, so I'm wasting space on S3 and I don't have an identifier that associates the file uploaded to the task. – Tom Aug 12 '23 at 21:50
  • There is no best way to handle this. While you have chosen sticky sessions with the load balancer to address the original problem of "not sticking" to the same machine rooted `/tmp`, the downside is that the load balancer can't address which machine was already in use (as sticky sessions resolve it with a single data-point by sticking to it). There is always a price you have to pay. // How do you do PHP sessions (if at all) in your distributed setup? And how large are the files you're handling? Have you considered a shared network drive between all instances? Are uploads already GUID-ed? – hakre Aug 12 '23 at 22:42
  • You cannot have stateful data links in a stateless infastructure, not sure how your current cancellation process is working/cleaning the local disk without a way to identify the file locations... You could always just change it to delete from the S3 disk instead. – Jaquarh Aug 13 '23 at 08:32

0 Answers0