7

I want to distribute the work from a master server to multiple worker servers using batches.

Ideally I would have a tasks.txt file with the list of tasks to execute

cmd args 1
cmd args 2
cmd args 3
cmd args 4
cmd args 5
cmd args 6
cmd args 7
...
cmd args n

and each worker server will connect using ssh, read the file and mark each line as in progress or done

#cmd args 1  #worker1 - done
#cmd args 2  #worker2 - in progress
#cmd args 3  #worker3 - in progress
#cmd args 4  #worker1 - in progress 
cmd args 5
cmd args 6
cmd args 7
...
cmd args n

I know how to make the ssh connection, read the file, and execute remotely but don't know how to make the read and write an atomic operation, in order to not have cases where 2 servers start the same task, and how to update the line.

I would like for each worker to go to the list of tasks and lock the next available task in the list rather than the server actively commanding the workers, as I will have a flexible number of workers clones that I will start or close according to how fast I will need the tasks to complete.

UPDATE:

and my ideea for the worker script would be :

#!/bin/bash

taskCmd=""
taskLine=0
masterSSH="ssh usr@masterhost"
tasksFile="/path/to/tasks.txt"

function getTask(){
    while [[ $taskCmd == "" ]]
    do
        sleep 1;
        taskCmd_and_taskLine=$($masterSSH "#read_and_lock_next_available_line $tasksFile;")
        taskCmd=${taskCmd_and_taskLine[0]}
        taskLine=${taskCmd_and_taskLine[1]}
    done
}

function updateTask(){
    message=$1
    $masterSSH "#update_currentTask $tasksFile $taskLine $message;"
}


function doTask(){
    return $taskCmd;
}


while [[ 1 -eq 1 ]]
do 
    getTask
    updateTask "in progress"
    doTask 
    taskErrCode=$?
    if [[ $taskErrCode -eq 0 ]]
    then 
        updateTask "done, finished successfully"
    else
        updateTask "done, error $taskErrCode"
    fi
    taskCmd="";
    taskLine=0;

done
Stefan Rogin
  • 1,499
  • 3
  • 25
  • 41
  • sorry @Aaron for the annoyance, and thanks for the edits, the last time I slipped on the enter key :) – Stefan Rogin Jan 29 '16 at 10:29
  • No problem, the question is interesting, it'd be a shame it didn't get the attention because of some bad formatting – Aaron Jan 29 '16 at 10:36
  • 1
    I like to use Redis for this - it is very easy and performant. Initially you can just use the CLI and later, as there are bindings for PHP, Perl etc. See example here http://stackoverflow.com/a/22220082/2836621 – Mark Setchell Jan 29 '16 at 11:03

4 Answers4

2

You can use flock to concurrently access the file:

exec 200>>/some/any/file ## create a file descriptor
flock -w 30 200 ## concurrently access /some/any/file, timeout of 30 sec.

You can point the file descriptor to your tasks list or any other file, but of course the same file in order to flock work. The lock will me removed as soon as the process that created it is done or fail. You can also remove the lock by yourself when you don't need it anymore:

flock -u 200

An usage sample:

ssh user@x.x.x.x '
  set -e
  exec 200>>f
  echo locking...
  flock -w 10 200
  echo working...
  sleep 5
'

set -e fails the script if any step fails. Play with the sleep time and execute this script in parallel. Just one sleep will execute at a time.

Joao Morais
  • 1,885
  • 13
  • 20
1

Check if you are reinventing GNU Parallel:

parallel -S worker1 -S worker2 command ::: arg1 arg2 arg3

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • nice, so I'm guessing I would be using it as : `cat tasks.txt | parallel -S usr@worker1 -S usr@worker2 '{}'` , but this would run from the master-server and requires that all the workers be available or listed when the task starts. Also how can you handle failures or a resume after a hw failure ? – Stefan Rogin Jun 23 '16 at 00:17
  • @clickstefan look at --retries, --slf, and --filter-hosts. – Ole Tange Jun 23 '16 at 05:35
0

try to implement something like

while read line; do    
   echo $line  
   #check if the line contains the # char, if not execute the ssh, else nothing to do
   checkAlreadyDone=$(grep "^#" $line)
   if [ -z "${checkAlreadyDone}" ];then
      <insert here the command to execute ssh call>
      <here, if everything has been executed without issue, you should 
      add a commad to update the file taskList.txt
      one option could be to insert a sed command but it should be tested>
   else
      echo "nothing to do for $line" 
   fi
done < taskList.txt

Regards Claudio

ClaudioM
  • 1,418
  • 2
  • 16
  • 42
0

I think I have successfully implemented one: https://github.com/guo-yong-zhi/DistributedTaskQueue
It is mainly based on bash, ssh and flock, and python3 is required for string processing.

guoyongzhi
  • 121
  • 1
  • 3