7

My yet to be released Delphi 2010 application allows users to upload their files to my servers. Right now I'm using HTTPS POST to send the files, the (simplified) algorithm is basically:

  1. Split File into "slices" (256KB each)
  2. For each slice, POST it to server

ie. for a 1MB file:

--> Get Slice #1 (256KB)
--> Upload Slice #1 using TidHTTP.Post()

--> Get Slice #2 (256KB)
--> Upload Slice #2 using TidHTTP.Post()

--> Get Slice #3 (256KB)
--> Upload Slice #3 using TidHTTP.Post()

--> Get Slice #4 (256KB)
--> Upload Slice #4 using TidHTTP.Post()

I'm using Indy 10. I (ab)used my profiler over and over and there are not much left to optimize except changing the upload routine itself.

I'm also using multi-threading, and even though I did my best to optimize my code, my benchmarks still tell me I can do better (there are other well optimized software that do achieve a much better timing...almost twice as fast as my upload routine!)

I know it's not my server's fault...here are the ideas that I still need to explore:

  1. I tried grouping slices in a single POST, naturally this resulted in a performance boost (20-35%) but resuming capability is now reduced.

  2. I also thought about using SFTP / SSH, but I'm not sure if it's fast.

  3. Use web sockets to implement resumable upload (like this component), I'm not sure about speed either.

Now my question is: is there something I can do to speed up my upload? I'm open to any suggestion that I can implement, including commandline tools (if license allows me to ship it with my application), provided that:

  1. Resumable upload is supported
  2. Fast!
  3. Reasonable memory usage
  4. Secure & allow login/user authentication

Also, because of major security concerns, FTP is a not something I'd want to implement.

Thanks a lot!

Artjom B.
  • 61,146
  • 24
  • 125
  • 222
TheDude
  • 3,045
  • 4
  • 46
  • 95
  • 1
    Does the transfer use data compression/decompression? – mjn Feb 28 '12 at 07:56
  • @mjn: yes (slices are already zipped before being uploaded + I use Indy's TIdCompressorZLib) – TheDude Feb 28 '12 at 15:39
  • @kobik: fairly straightforward php code (move_uploaded_file() + md5 checking + simple sql insert), I measured the php timing, it's definitely not the bottleneck. – TheDude Feb 28 '12 at 15:42

1 Answers1

5

I would suggest doing a single TIdHTTP.Post() for the entire file without chunking it at all. You can use the TIdHTTP.OnWork... events to keep track of how many bytes were sent to the server so you know where to resume from if needed. When resuming, you can use the TIdHTTP.Request.CustomHeaders property to include a custom header that tells the server where you are resuming from, so it can roll back its previous file to the specfiied offset before accepting the new data.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • That's great, I didn't know I could resume a POST. Let me see if I got this right: in the PHP code, I add this --> header('Accept-Ranges: bytes'); and in Delphi if I add this (just an example): IdHTTP.Request.CustomHeaders.Add('Range: bytes=5000-'); the HTTP POST will automatically discard extra bytes (roll back) & pickup from the 5000th byte, is that correct? – TheDude Feb 29 '12 at 03:11
  • To resume a previous `POST`, you can pass in a `TStream` that has just the remaining data in it. But the server has to support the resume and append the new data to the existing file, not overwrite the file fresh. Upload resuming is not part of the standard HTTP protocol. The `Accept-Ranges` response header and the `Range` request header are only for **downloads**, not **uploads**. When I mentioned a custom header, I was referring to a custom `X-...` header of you own design that your PHP code can look for, eg: `X-Resuming-From: ...`. – Remy Lebeau Feb 29 '12 at 08:55
  • Or the `Content-Range` header, though RFC 2616 suggests that it is usually only used in responses, not in requests. – Remy Lebeau Feb 29 '12 at 09:06
  • You could alternatively write separate scripts, one to `POST` to when sending data for a new upload, and another one to `POST` remaining data to when resuming a previous upload. Then you don't have to use custom request headers. You could send a `HEAD` request to determine how many bytes the server actually has available before starting a resume. – Remy Lebeau Feb 29 '12 at 09:09
  • Thank you Remy, but I'm a confused: in order to send the 'HEAD' request I need to know the file path/reference, but if I'm not mistaken the PHP script only starts its execution *after* the file upload has been completed, or am I missing something here? I mean can you elaborate a little bit? Thanks!! – TheDude Feb 29 '12 at 13:19
  • You are probably thinking of `PUT`, or maybe `$_POST_FILES`. Either one of those are meant for working with complete files only. The response to `PUT` tells the client the URL of the file that was created so it can be accessed later. A generic `POST`, on the other hand, is just arbitrary data, the receiving script decides what to do with that data. You can use `$_POST`, `$HTTP_RAW_POST_DATA`, or `fopen("php://input")` to access the raw data and do whatever you want with it. Just be careful because `$_POST` and `$HTTP_RAW_POST_DATA` are limited by php.ini directives, but `php://input` is not. – Remy Lebeau Feb 29 '12 at 17:35
  • Thank you Remy, just to make sure I understood you correctly, let's say I have this [Delphi code](http://pastebin.com/bVcc62mV) and this [old PHP code](http://pastebin.com/E98AnM2Y), you mean I'd have to change the PHP code to [something like this](http://pastebin.com/A9KezmeD)? – TheDude Feb 29 '12 at 18:43
  • Thank you Remy and sorry if I had to *torture* you with my stupid questions over and over! For those interested in the solution, please take a look at the [linked question](http://stackoverflow.com/questions/9598273/resume-http-post-upload-with-indy/9600133), and the [chat messages here](http://chat.stackoverflow.com/rooms/8801/discussion-between-remy-lebeau-and-gdhami). **Delphi** code is [here](http://pastebin.com/6haHLAe3) and **PHP** code is [here](http://pastebin.com/7bhP7HwB) – TheDude Mar 13 '12 at 05:28