0

I need a shell script that will create a loop to start parallel tasks read in from a file...

Something in the lines of..

#!/bin/bash
mylist=/home/mylist.txt
for i in ('ls $mylist')
do
do something like cp -rp $i /destination &
end
wait

So what I am trying to do is send a bunch of tasks in the background with the "&" for each line in $mylist and wait for them to finish before existing.

However, there may be a lot of lines in there so I want to control how many parallel background processes get started; want to be able to max it at say.. 5? 10?

Any ideas?

Thank you

  • 1
    sounds like you want [GNU parallel](https://savannah.gnu.org/projects/parallel/) – glenn jackman Jan 14 '15 at 02:21
  • Anyway to do this without adding any additional utilities? I know about parallel but I am unable to get the admin to install it. –  Jan 14 '15 at 02:27
  • possible duplicate of [Parallelize Bash Script with maximum number of processes](http://stackoverflow.com/questions/38160/parallelize-bash-script-with-maximum-number-of-processes) – shellter Jan 14 '15 at 02:53
  • Unless you're copying to different devices, you're unlikely to get much of a speed boost copying files, and if you have more than a couple, it's likely to be slower. – Kevin Jan 14 '15 at 17:16
  • @exxoid What makes you think that you need your admin to install GNU Parallel? – Ole Tange Jan 14 '15 at 22:34

2 Answers2

3

Your task manager will make it seem like you can run many parallel jobs. How many you can actually run to obtain maximum efficiency depends on your processor. Overall you don't have to worry about starting too many processes because your system will do that for you. If you want to limit them anyway because the number could get absurdly high you could use something like this (provided you execute a cp command every time):

...
while ...; do
    jobs=$(pgrep 'cp' | wc -l)
    [[ $jobs -gt 50 ]] && (sleep 100 ; continue)
    ...
done

The number of running cp commands will be stored in the jobs variable and before starting a new iteration it will check if there are too many already. Note that we jump to a new iteration so you'd have to keep track of how many commands you already executed. Alternatively you could use wait.

Edit: On a side note, you can assign a specific CPU core to a process using taskset, it may come in handy when you have fewer more complex commands.

ShellFish
  • 4,351
  • 1
  • 20
  • 33
3

You are probably looking for something like this using GNU Parallel:

parallel -j10 cp -rp {} /destination :::: /home/mylist.txt

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Ole Tange
  • 31,768
  • 5
  • 86
  • 104