I am running my shell script on machineA
which copies the files from machineB
and machineC
to machineA
.
If the file is not there in machineB
, then it should be there in machineC
for sure. So I will try to copy file from machineB
first, if it is not there in machineB
then I will go to machineC
to copy the same files.
In machineB
and machineC
there will be a folder like this from which I am supposed to copy the files -
/data/pe_t1_snapshot/20140317
I need to copy around 400
files in machineA
from machineB
and machineC
and each file size is around 3.5 GB
and network is 10 Gigabytes which is encrypted and decrypted at the both ends.
Earlier, I was trying to copy the files one by one in machineA
which is really slow and it is taking around 3 hours. Is there any way, I can have 5 different threads
and each thread handles a file at a time so there should only be 5 background processes running. I don't want to download all the files in parallel since 400 parallel transfers
will cause packet loss and angry network admins :)
Or Split the big group of files in sets of five files, and download those five files in parallel until all the files have been completed?
Below is my shell script which copies the file one by one in machineA
from machineB
and machineC
.
#!/bin/bash
readonly PRIMARY=/export/home/david/dist/primary
readonly FILERS_LOCATION=(machineB machineC)
PRIMARY_PARTITION=(0 3 5 7 9 11 13 15 17 19 21 23 25 27 29) # this will have more file numbers around 400
dir1=/data/pe_t1_snapshot/20140317
# delete all the files first
find "$PRIMARY" -mindepth 1 -delete
for el in "${PRIMARY_PARTITION[@]}"
do
scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[0]}:$dir1/s5_daily_1980_"$el"_200003_5.data $PRIMARY/. || scp -o ControlMaster=auto -o 'ControlPath=~/.ssh/control-%r@%h:%p' -o ControlPersist=900 david@${FILERS_LOCATION[1]}:$dir1/s5_daily_1980_"$el"_200003_5.data $PRIMARY/.
done
Problem Statement:-
I don't want to download ALL files in parallel. I am just trying to limit the number of threads to four or five. Our Unix Admin suggested me to try like this and it will help me in my file transfers speed and I am not sure how I can enforce the number of threads in my above shell script or split the big group of file numbers into sets of five files and download them in parallel?
Is this possible to do? If yes, then can anyone provide an example on this?