3 - 4 tricks for speeding up rsync.
1. Copying from/to local network: don't use ssh
!
If you're locally copying a server to another, there is no need to encrypt data during transfer!
By default, rsync use ssh
to transer data through network. To avoid this, you have to create a rsync server
on target host. You could punctually run daemon by something like:
rsync --daemon --no-detach --config filename.conf
where minimal configuration file could look like: (see man rsyncd.conf
)
filename.conf
port = 12345
[data]
path = /some/path
use chroot = false
Then
rsync -ax rsync://remotehost:12345/data/. /path/to/target/.
rsync -ax /path/to/source/. rsync://remotehost:12345/data/.
1.1. Minimal rsyncd.conf
for restricting connection.
Regarding jeremyjjbrown's comment about security, here is a minimal config sample using dedicated network interfaces:
Main public server:
eth0: 1.2.3.4/0 Public address Main
eth1: 192.168.123.45/30 Backup network
A 30bits network could hold only two hosts.
┏━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━┯━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━┓
┃ Network base│192.168.123.44 │ #0│11000000 10101000 01111011 001011│00┃
┃ Mask │255.255.255.252│/30 │11111111 11111111 11111111 111111│00┃
┃ Broadcast │192.168.123.47 │ #3│11000000 10101000 01111011 001011│11┃
┃ Host/net │2 │Class C │ │ ┃
┠─────────────┼───────────────┼───────────┼─────────────────────────────────┼──┨
┃▸First host │192.168.123.45 │ #1│11000000 10101000 01111011 001011│01┃
┃ Last host │192.168.123.46 │ #2│11000000 10101000 01111011 001011│10┃
┗━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━┷━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━┛
Backup server:
eth0: 1.2.3.5/0 Public address Backup
eth1: 192.168.123.46/30 Backup network
cat >/etc/rsyncd.conf <<eof
address 192.168.123.46
[main]
path = /srv/backup/backup0
comment = Backups
read only = false
uid = 0
gid = 0
eof
So rsync will listen only on connection comming to 192.168.123.46
aka second network interface.
Then rsync
is run from main server
rsync -zaSD --zc zstd --delete --numeric-ids /mnt/. rsync://192.168.123.46/main/.
Of course, adding some rule in your firewall could be not totally useless.
iptables -I INPUT -i eth0 -p tcp --dport 873 -j DROP
2. Using zstandard zstd
for high speed compression
Zstandard could be upto 8x faster than the common gzip
. So using this newer compression algorithm will improve significantly your transfer!
rsync -axz --zc=zstd rsync://remotehost:12345/data/. /path/to/target/.
rsync -axz --zc=zstd /path/to/source/. rsync://remotehost:12345/data/.
with maybe some --exclude
directives (See at bottom of this answer!).
3. Multiplexing rsync
to reduce inactivity due to browse time
Two important remarks:
As this kind of optimisation is about disk access and filesystem structure. There is nothing to see with number of CPU! So this could improve transfer even if your host use single core CPU. If you plan
to use any parallelizer tool, you have to tell him to not consider
number of physical CPU.
As the goal is to ensure maximum data are using bandwidth while other task browse filesystem, the most suited number of simultaneous process depend on number of small files presents.
3.1 bash script using wait -n -p PID
:
Recent bash added a -p PID
feature to wait
builtin. Just the must for this kind of jobs:
#!/bin/bash
maxProc=3
source=''
destination='rsync://remotehost:12345/data/'
declare -ai start elap results order
wait4oneTask() {
local _i
wait -np epid
results[epid]=$?
elap[epid]=" ${EPOCHREALTIME/.} - ${start[epid]} "
unset "running[$epid]"
while [ -v elap[${order[0]}] ];do
_i=${order[0]}
printf " - %(%a %d %T)T.%06.0f %-36s %4d %12d\n" "${start[_i]:0:-6}" \
"${start[_i]: -6}" "${paths[_i]}" "${results[_i]}" "${elap[_i]}"
order=(${order[@]:1})
done
}
printf " %-22s %-36s %4s %12s\n" Started Path Rslt 'microseconds'
for path; do
rsync -axz --zc zstd "$source$path/." "$destination$path/." &
lpid=$!
paths[lpid]="$path"
start[lpid]=${EPOCHREALTIME/.}
running[lpid]=''
order+=($lpid)
((${#running[@]}>=maxProc)) && wait4oneTask
done
while ((${#running[@]})); do
wait4oneTask
done
Output could look like:
myRsyncP.sh files/*/*
Started Path Rslt microseconds
- Fri 03 09:20:44.673637 files/1/343 0 1186903
- Fri 03 09:20:44.673914 files/1/43 0 2276767
- Fri 03 09:20:44.674147 files/1/55 0 2172830
- Fri 03 09:20:45.861041 files/1/772 0 1279463
- Fri 03 09:20:46.847241 files/2/346 0 2363101
- Fri 03 09:20:46.951192 files/2/4242 0 2180573
- Fri 03 09:20:47.140953 files/3/23 0 1789049
- Fri 03 09:20:48.930306 files/3/2545 0 3259273
- Fri 03 09:20:49.132076 files/3/4256 0 2263019
Quick check:
printf "%'d\n" $(( 49132076 + 2263019 - 44673637)) \
$((1186903+2276767+2172830+1279463+2363101+2180573+1789049+3259273+2263019))
6’721’458
18’770’978
There was 6,72seconds elapsed to process 18,77seconds under upto three subprocess.
Note: you could use musec2str to improve ouptut, by replacing 1st long printf
line by:
musec2str -v elapsed "${elap[i]}"
printf " - %(%a %d %T)T.%06.0f %-36s %4d %12s\n" "${start[i]:0:-6}" \
"${start[i]: -6}" "${paths[i]}" "${results[i]}" "$elapsed"
myRsyncP.sh files/*/*
Started Path Rslt Elapsed
- Fri 03 09:27:33.463009 files/1/343 0 18.249400"
- Fri 03 09:27:33.463264 files/1/43 0 18.153972"
- Fri 03 09:27:33.463502 files/1/55 93 10.104106"
- Fri 03 09:27:43.567882 files/1/772 122 14.748798"
- Fri 03 09:27:51.617515 files/2/346 0 19.286811"
- Fri 03 09:27:51.715848 files/2/4242 0 3.292849"
- Fri 03 09:27:55.008983 files/3/23 0 5.325229"
- Fri 03 09:27:58.317356 files/3/2545 0 10.141078"
- Fri 03 09:28:00.334848 files/3/4256 0 15.306145"
The more: you could add overall stat line by some edits in this script:
#!/bin/bash
maxProc=3 source='' destination='rsync://remotehost:12345/data/'
. musec2str.bash # See https://stackoverflow.com/a/72316403/1765658
declare -ai start elap results order
declare -i sumElap totElap
wait4oneTask() {
wait -np epid
results[epid]=$?
local -i _i crtelap=" ${EPOCHREALTIME/.} - ${start[epid]} "
elap[epid]=crtelap sumElap+=crtelap
unset "running[$epid]"
while [ -v elap[${order[0]}] ];do # Print status lines in command order.
_i=${order[0]}
musec2str -v helap ${elap[_i]}
printf " - %(%a %d %T)T.%06.f %-36s %4d %12s\n" "${start[_i]:0:-6}" \
"${start[_i]: -6}" "${paths[_i]}" "${results[_i]}" "${helap}"
order=(${order[@]:1})
done
}
printf " %-22s %-36s %4s %12s\n" Started Path Rslt 'microseconds'
for path;do
rsync -axz --zc zstd "$source$path/." "$destination$path/." &
lpid=$! paths[lpid]="$path" start[lpid]=${EPOCHREALTIME/.}
running[lpid]='' order+=($lpid)
((${#running[@]}>=maxProc)) &&
wait4oneTask
done
while ((${#running[@]})) ;do
wait4oneTask
done
totElap=${EPOCHREALTIME/.}
for i in ${!start[@]};do sortstart[${start[i]}]=$i;done
sortstartstr=${!sortstart[*]}
fstarted=${sortstartstr%% *}
totElap+=-fstarted
musec2str -v hTotElap $totElap
musec2str -v hSumElap $sumElap
printf " = %(%a %d %T)T.%06.0f %-41s %12s\n" "${fstarted:0:-6}" \
"${fstarted: -6}" "Real: $hTotElap, Total:" "$hSumElap"
Could produce:
$ ./parallelRsync Data\ dirs-{1..4}/Sub\ dir{A..D}
Started Path Rslt microseconds
- Sat 10 16:57:46.188195 Data dirs-1/Sub dirA 0 1.69131"
- Sat 10 16:57:46.188337 Data dirs-1/Sub dirB 116 2.256086"
- Sat 10 16:57:46.188473 Data dirs-1/Sub dirC 0 1.1722"
- Sat 10 16:57:47.361047 Data dirs-1/Sub dirD 0 2.222638"
- Sat 10 16:57:47.880674 Data dirs-2/Sub dirA 0 2.193557"
- Sat 10 16:57:48.446484 Data dirs-2/Sub dirB 0 1.615003"
- Sat 10 16:57:49.584670 Data dirs-2/Sub dirC 0 2.201602"
- Sat 10 16:57:50.061832 Data dirs-2/Sub dirD 0 2.176913"
- Sat 10 16:57:50.075178 Data dirs-3/Sub dirA 0 1.952396"
- Sat 10 16:57:51.786967 Data dirs-3/Sub dirB 0 1.123764"
- Sat 10 16:57:52.028138 Data dirs-3/Sub dirC 0 2.531878"
- Sat 10 16:57:52.239866 Data dirs-3/Sub dirD 0 2.297417"
- Sat 10 16:57:52.911924 Data dirs-4/Sub dirA 14 1.290787"
- Sat 10 16:57:54.203172 Data dirs-4/Sub dirB 0 2.236149"
- Sat 10 16:57:54.537597 Data dirs-4/Sub dirC 14 2.125793"
- Sat 10 16:57:54.561454 Data dirs-4/Sub dirD 0 2.49632"
= Sat 10 16:57:46.188195 Real: 10.870221", Total: 31.583813"
Fake rsync
for testing this script
Note: For testing this, I've used a fake rsync:
## Fake rsync wait 1.0 - 2.99 seconds and return 0-255 ~ 1x/10
rsync() { sleep $((RANDOM%2+1)).$RANDOM;exit $(( RANDOM%10==3?RANDOM%128:0));}
export -f rsync
4. Important step to speed up rsync process: avoid to slow him down!!
You could have to give some time to adequately configure the way you will avoid to synchronize useless datas!!
Search in man page for exclude
and/or include
:
--cvs-exclude, -C auto-ignore files in the same way CVS does
--exclude=PATTERN exclude files matching PATTERN
--exclude-from=FILE read exclude patterns from FILE
--include=PATTERN don't exclude files matching PATTERN
--include-from=FILE read include patterns from FILE
For saving user directory, I often use:
rsync -axz --delete --zc zstd --exclude .cache --exclude cache source/. target/.
Read carefully FILTER RULES
section in man page:
man -P'less +/^FILTER\ RULES' rsync
Conclusion:
Read quietly man pages!! man rsync
and man rsyncd.conf
!!