3

I have a shell script job.sh.

contents are below:

#!/bin/bash

table=$1

sqoop job --exec ${table}

Now when I do ./job.sh table1

The script executes successfully.

I have the table names in a file tables.txt.

Now I want to loop over the tables.txt file and execute the job.sh script 10 times in parallel.

How can I do that?

Ideally when I execute the script I want it to do like below;

./job.sh table1
./job.sh table2
./job.sh table3
./job.sh table4
./job.sh table5
./job.sh table6
./job.sh table7
./job.sh table8
./job.sh table9
./job.sh table10

What are the options available?

User12345
  • 5,180
  • 14
  • 58
  • 105

3 Answers3

5

Simply with GNU Parallel

parallel -a tables.txt --dry-run sqoop job --exec {}

Sample Output

sqoop job --exec table7
sqoop job --exec table8
sqoop job --exec table9
sqoop job --exec table6
sqoop job --exec table5
sqoop job --exec table4
sqoop job --exec table3
sqoop job --exec table2
sqoop job --exec table1
sqoop job --exec table10

If that looks correct, just remove the --dry-run and run again for real.

If you would like 4 jobs run at a time, use:

parallel -j 4 ....

If you would like one job per CPU core, that is the default, so you don't need to do anything.

If you would like the jobs to be kept in order, add -k option:

parallel -k ...
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
3

You can just do

< tables.txt xargs -I% -n1 -P10 echo sqoop job --exec %

the -P10 will run 10 processes in parallel. And you don't even need the helper script.

As @CharlesDuffy commented, you don't need the -I, e.g. even simpler:

< tables.txt xargs -n1 -P10 echo sqoop job --exec
clt60
  • 62,119
  • 17
  • 107
  • 194
  • @CharlesDuffy True! The `-I` isn't needed in this case. it could be helpful in case like `printf "%s\n" {1..20} | xargs -I% -n1 -P10 echo sqoop job --exec table%` – clt60 Apr 26 '17 at 19:46
  • Sure, though one could use `table{1..20}` there as well, and avoid the hairiness that comes with `-I`. Granted, the 255-byte string limit isn't an *immediate* issue, and the tendency to be abused in ways that lead to injection attacks or the POSIX-specified limit on number of substitutions per command line (or 5) likewise, but it's something that just strikes me as a smell. – Charles Duffy Apr 26 '17 at 19:47
0

Option 1

Start all scripts as background processes by appending &, e.g.

./job.sh table1 &
./job.sh table2 &
./job.sh table3 &

However, this will run all jobs at the same time!

Option 2

For more time or memory consuming scripts, you can run a limited number of task at the same time using xargs as for example described here.

Community
  • 1
  • 1
Feodoran
  • 1,752
  • 1
  • 14
  • 31