2

OS: Cent-OS

I have some 30,000 jobs(or Scripts) to run. Each job takes 3-5 Min. I have 48 CPUs(nproc = 48). I can use 40 CPUs to run 40 Jobs parallelly. please suggest some script or tools can handle 30,000 Jobs by running each 40 Jobs parallely.

What I had done:

  • I created 40 Different folders and executed the jobs parallely by creating a shell script for each directory.

  • I want to know better ways to handle this kind of jobs next time.

aravind ramesh
  • 307
  • 2
  • 15
  • Write a single script whose job is to distribute the 30,000 jobs by forking up to 40 processes. This parent script will wait for child process to finish and move on to next job by forking a new process. At any time, you can have up to 40 child processes. – anonymous Mar 06 '14 at 08:53
  • You now have 40 folders holding "files?" for each job, right? Why don't you use one folder named "queue/" to hold all the files, and your script that runs a job will "move one file from the folder queue/ to folder working/ and work on this job; upon finish move the file to folder done/"? In this way the number of script you starts = number of concurrency; You can see what's on queue, working, and done. When you need to start again, just move all the files back to "queue/" folder. You can also add more jobs when its still in the middle of processing or remove jobs not start yet. – Ken Cheung Mar 06 '14 at 09:03
  • Note: In your job processing script when it 'moves' the file its better to move it to folder "working/" with its pid ($$) appended in the filename and check that the moved file is really holding its pid (in the filename). This is a simple way to handle locking because the filesystem will 'move' the file in atomic way, with no need about semaphore nor locking. – Ken Cheung Mar 06 '14 at 09:06
  • @KenCheung forking is required to implement your suggestion. – aravind ramesh Mar 06 '14 at 09:51
  • Oh... yes. Indeed I have something similar and I have my scripts start using /etc/inittab (CentOS 5) and it sleeps when there's no file in the queue/ folder. – Ken Cheung Mar 06 '14 at 09:58
  • @KenCheung Will it it be possible to share on pastebin. – aravind ramesh Mar 06 '14 at 10:26

3 Answers3

4

As Mark Setchell says: GNU Parallel.

find scripts/ -type f | parallel

If you insists on keeping 8 CPUs free:

find scripts/ -type f | parallel -j-8

But usually it is more efficient simply to use nice as that will give you all 48 cores when no one else needs them:

find scripts/ -type f | nice -n 15 parallel

To learn more:

Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • But will it give only 48 jobs or it will submit all 30K jobs ? – aravind ramesh Mar 06 '14 at 11:08
  • It will run 1 job per cpu core until all 30K jobs are done. If you have more questions, I recommend you watch the intro video and walk through the tutorial first: It will clear up most questions. – Ole Tange Mar 06 '14 at 22:20
2

I have used REDIS to do this sort of thing - it is very simple to install and the CLI is easy to use.

I mainly used LPUSH to push all the jobs onto a "queue" in REDIS and BLPOP to do a blocking remove of a job from the queue. So you would LPUSH 30,000 jobs (or script names or parameters) at the start, then start 40 processes in the background (1 per CPU) and each process would sit in a loop doing BLPOP to get a job, run it and do the next.

You can add layers of sophistication to log completed jobs in another "queue".

Here is a little demonstration of what to do...

First, start a Redis server on any machine in your network:

./redis-server &    # start REDIS server in background

Or, you could put this in your system startup if you use it always.

Now push 3 jobs onto queue called jobs:

./redis-cli         # start REDIS command line interface
redis 127.0.0.1:6379> lpush jobs "job1"
(integer) 1
redis 127.0.0.1:6379> lpush jobs "job2"
(integer) 2
redis 127.0.0.1:6379> lpush jobs "job3"
(integer) 3

See how many jobs there are in queue:

redis 127.0.0.1:6379> llen jobs
(integer) 3

Wait with infinite timeout for job

redis 127.0.0.1:6379> brpop jobs 0
1) "jobs"
2) "job1"
redis 127.0.0.1:6379> brpop jobs 0
1) "jobs"
2) "job2"
redis 127.0.0.1:6379> brpop jobs 0
1) "jobs"
2) "job3"

This last one will wait a LONG time as there are no jobs in queue:

redis 127.0.0.1:6379> brpop jobs 0

Of course, this is readily scriptable:

Put 30,000 jobs in queue:

for ((i=0;i<30000;i++)) ; do
    echo "lpush jobs job$i" | redis-cli
done

If your Redis server is on a remote host, just use:

redis-cli -h <HOSTNAME>

Here's how to check progress:

echo "llen jobs" | redis-cli
(integer) 30000

Or, more simply maybe:

redis-cli llen jobs
(integer) 30000

And you could start 40 jobs like this:

#!/bin/bash
for ((i=0;i<40;i++)) ; do
    ./Keep1ProcessorBusy  $i &
done

And then Keep1ProcessorBusy would be something like this:

#!/bin/bash

# Endless loop picking up jobs and processing them
while :
do
    job=$(echo brpop jobs 0 | redis_cli)
    # Set processor affinity here too if you want to force it, use $1 parameter we were called with
    do $job
done

Of course, the actual script or job you want run could also be stored in Redis.


As a totally different option, you could look at GNU Parallel, which is here. And also remember that you can run the output of find through xargs with the -P option to parallelise stuff.

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
0

Just execute those scripts, Linux will internally distribute those tasks properly amongst available CPUs. This is upon the Linux task scheduler. But, if you want you can also execute a task on a particular CPU by using taskset (see man taskset). You can do it from a script to execute your 30K tasks. Remember in this manual way, be sure about what you are doing.

rakib_
  • 136,911
  • 4
  • 20
  • 26
  • I assume the question implies he didn't want all 30K jobs to run concurrently. Assign processor affinity using taskset didn't implies the job will execute one by one. They will run together like the day we have only one CPU (core) on Linux - multitasking. – Ken Cheung Mar 06 '14 at 09:10
  • taskset affines a particular task to a specific CPU and what I meant is affine all those task to the available CPUs and schedule them in round robin fashion and I also warned him about it. I just told him about his available option, I didn't insist. – rakib_ Mar 06 '14 at 09:16
  • I do not see how this is useful for the job. – aravind ramesh Mar 06 '14 at 09:52
  • @aravindramesh You've failed to understand. – rakib_ Mar 06 '14 at 10:32
  • Let me try, whenever you run your scripts, it's a new process. Linux task scheduler is responsible for making decisions about how your task will get cpu ie which task will get which CPU it's upon task scheduler, scheduler also contains load balancer, which makes sure no CPUs remains idle, whenever a CPU becomes idle it pulls tasks from other busy CPUs, in this way Linux offers best throughput. Just run them all.Now, if you thinking of splitting them up by 40 task at once, you can do so, but it's not strictly required.But, if you really want it, then you can take help of taskset.Hope you got it – rakib_ Mar 06 '14 at 11:41