26

I'm trying to figure out the best way to transfer large amounts of data over a network between two systems. I am currently looking into either FTP, HTTP, or RSync, and I am wondering which one is the fastest. I've looked online for some answers and found the following sites:

The problem is that these are old, and talk more about the theoretical differences between how the protocols communicate. I am more interested with actual benchmarks, that can say that for a specific setup, when transferring files of varying sizes one protocol is x% faster then the others.

Has anyone test these and posted the results somewhere?

oneself
  • 38,641
  • 34
  • 96
  • 120
  • 4
    FTP is always awfully slow on many small files. – kirilloid Mar 14 '12 at 18:29
  • If you have ssh, a tar + gzip pipe is a fast and simple solution. ``tar -cf - | gzip | ssh user@host tar -xvf -``. Much faster with lots of small files than ftp. At least if your transferring from Linux to Linux, I had issues with bsdtar in the past. – Gellweiler Jun 02 '19 at 19:52
  • FASP protocol is the fastest way to transfer files. Check IBM Aspera project. https://pacgenesis.com/tcp-vs-udp-vs-fasp-which-is-the-fastest-protocol/#:~:text=FASP%C2%AE%20%E2%80%93%20which%20standards%20for,your%20big%20data%20transferring%20needs. – Amin Pial Apr 10 '21 at 16:55

5 Answers5

43

Alright, so I setup the following test:

  • Hardware: 2 desktops Intel Core Duo CPU @ 2.33GHz, with 4G of RAM.
  • OS: Ubuntu 11.10 on both machines
  • Network: 100Mb dedicated switch, both machines are connect to it.
  • Software:

I uploaded the following groups of files to each server:

  1. 1 100M file.
  2. 10 10M files.
  3. 100 1M files.
  4. 1,000 100K files.
  5. 10,000 10K files.

I got the following average results over multiple runs (numbers in seconds):

|-----------+---------+----------|
| File Size | FTP (s) | HTTP (s) |
|-----------+---------+----------|
|      100M |       8 |        9 |
|       10M |       8 |        9 |
|        1M |       8 |        9 |
|      100K |      14 |       12 |
|       10K |      46 |       41 |
|-----------+---------+----------| 

So, it seems that FTP is slightly faster in large files, and HTTP is a little faster in many small files. All in all, I think that they are comparable, and the server implementation is much more important then the protocol.

oneself
  • 38,641
  • 34
  • 96
  • 120
  • 9
    would be nice to see scp and a couple of rsync variants (with/without compression, --inplace, etc... :) – ashwoods Oct 16 '12 at 09:27
  • Also remember what was said above about the protocols, if you can set something up that does not have as much overhead in the protocol, UDP for example and you have reliable network transfer it can go much faster that way. Here is the StackOverflow discussion on it: http://stackoverflow.com/questions/47903/udp-vs-tcp-how-much-faster-is-it – Mandrake Oct 18 '12 at 18:29
  • Many years later, thank you for this answer and your effort. – rath Jul 31 '17 at 14:59
9

If the machines at each end are reasonably powerful (ie not netbooks, NAS boxes, toasters, etc), then I would expect all protocols which work over TCP to be much the same speed at transferring bulk data. The application protocol's job is really just to fill a buffer for TCP to transfer, so as long as they can keep it full, TCP will set the pace.

Protocols which do compression or encryption may bottleneck at the CPU on less powerful machines. My netbook does FTP much faster than SCP.

rsync does clever things to transmit incremental changes quickly, but for bulk transfers it has no advantage over dumber protocols.

Tom Anderson
  • 46,189
  • 17
  • 92
  • 133
6

Another utility to consider is bbcp : http://www.slac.stanford.edu/~abh/bbcp/.

A good, but dated, tutorial to using it is here: http://pcbunn.cithep.caltech.edu/bbcp/using_bbcp.htm . I have found that bbcp is extremely good at transferring large files (multiple GBs). In my experience, it is faster than rsync on average.

dcherian
  • 186
  • 1
  • 1
  • 1
    I couldn't add this extra link earlier because I didn't have enough reputation. This is where I found out about it: http://moo.nac.uci.edu/~hjm/HOWTO_move_data.html . This link describes a bunch of different programs and their advantages / disadvantages with respect to each other. – dcherian May 07 '14 at 18:37
4

rsync optionally compresses its data. That typically makes the transfer go much faster. See rsync -z.

You didn't mention scp, but scp -C also compresses.

Do note that compression might make the transfer go faster or slower, depending upon the speed of your CPU and of your network link. (Slower links and faster CPU make compression a good idea; faster links and slower CPU make compression a bad idea.) As with any optimization, measure the results in your own environment.

Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • 2
    Tell me more about how FTP optionally compresses the data. I'm unfamiliar with that. – Robᵩ Mar 15 '12 at 15:09
  • Thanks for telling me about that. I didn't know about MODE Z. But, since it isn't standardized, isn't supported by either of the FTP clients I use, and isn't supported by the FTP servers I connect to, I'll stand by my recommandation to use `rsync -z`. – Robᵩ Mar 15 '12 at 18:24
  • Using rsync -z significantly can lower transfer rates, it might make sense in a slow link, in a local network it actually slows thing down depending on the host machine. – ashwoods Oct 15 '12 at 07:09
  • *depending on the cpu - usually that's not the case if your connection is 100MBits/s – france1 Jul 30 '22 at 15:55
3

I'm afraid if you want to know the answer for your needs and setup, you either have to be more specific or do your own performance (and reliability) tests. It does help to have an at least rudimentary understanding of the protocols in question and their communication, so I'd consider the articles you've been quoting a helpful resource. It also helps to know which restrictions the early inventors of these protocols faced - was their aim to keep network impact low, were they memory-starved, or did they have to count their cpu-cycles? Here's a few things to consider or answer if you want to get an answer tailored to your situation:

  • OS/File System related:
    • are you copying between the same OS/FS combination or do you have to worry about incompatibilities, such as file types without matching equivalent at the receiving end?
    • I.e. do you have anything special to transport? Metadata, ressource forks, extended attributes, file permissions might either just not be transported by the protocol/tool of your choice, or be meaningless at the receiving end.
    • The same goes for sparse files, which might end up being bloated to full size at the other end of the copy, ruining all plans you may have had about sizing.
  • Physical constraints related:
    • Network impact
    • cpu load: nowadays, compression is much "cheaper", since modern CPUs are less challenged by the compression than those back in the times when most transfer protocols were designed.
    • failure tolerance - do you need to be able to pick up where an interrupted transfer left you, or do you prefer to start anew?
    • incremental transfers, or full transfers? Does an incremental transfer pose any big savings for you, or do you have full transfers by design of your task anyway? In the latter case, the added latency and memory impact to build the transfer list before starting the transfer would be a less desirable tradeoff.
    • How good is the protocol at utilizing the MTU available by your underlying network protocol?
    • Do you need to maintain a steady stream of data, for example to keep a tape drive streaming at the receiving end?

Lots of things to consider, and I'm sure the listing isn't even complete.

Tatjana Heuser
  • 964
  • 9
  • 11