1

I've created a bash script to connect to a number of servers and execute a program. The ips and quantities per IP should be read from a config file that is structured like this:

127.0.0.1 10
127.0.0.1 1
127.0.0.1 3

etc

j=$((0))
while IFS=' ' read -r ip quantity; do
  echo "${ip} x ${quantity}";

  for (( i = 1; i <= quantity; i++ ))
  do
    echo "ssh root@${ip} cd test/libhotstuff && ./examples/hotstuff-app --conf ./hotstuff.gen-sec${j}.conf > log${j} 2>&1"
    ssh root@"${ip}" "cd test/libhotstuff && ./examples/hotstuff-app --conf ./hotstuff.gen-sec${j}.conf > log${j} 2>&1" &
    j=$((j+1))
  done

  sleep 1

done < ips

I noticed that this while loop breaks if the execution takes too long. If I put sleep for 1s here it will stop after the first execution. If I remove it, but the inner loop takes too long a subset of the lines will not be read.

What is the problem here?

raycons
  • 735
  • 12
  • 26
  • I didn't know that was possible =D – raycons Oct 28 '20 at 13:09
  • 1
    [BashFAQ](https://mywiki.wooledge.org/BashFAQ/001) is a godsend.. Also, worth reading the whole [bash manual](https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.html) through before writing code. You'll save time and headache in the end. – Paul Hodges Oct 28 '20 at 13:15
  • 1
    I adjusted it, that made the script much clearer, thanks. But the problem still occurs. – raycons Oct 28 '20 at 13:15
  • add `set -x` before the loop and see if the debugging output offers a clue. Also setting before a command helps log that command as run without having to maintain an echo and the actual command execution, in case it changes... – Paul Hodges Oct 28 '20 at 13:16
  • From the debug output it seems that it stops working after the first ssh succeeds. – raycons Oct 28 '20 at 13:20
  • Should be "i <= $quantity" or is that a typo ? – Andre Gelinas Oct 28 '20 at 13:21
  • wait at which location of the script? at the place of the sleep? – raycons Oct 28 '20 at 13:23
  • Yes, I'm running an interactive protocol between those servers. That's why I want to run those. – raycons Oct 28 '20 at 13:28
  • With the wait it's the same result as without. – raycons Oct 28 '20 at 13:31
  • I have a long "sleep" at the end and afterwards another ssh to kill all the processes. – raycons Oct 28 '20 at 13:36
  • Without the wait, it will run them in background and exit immediately. – Paul Hodges Oct 28 '20 at 13:45
  • @TedLyngmo without placing it in the background it's stuck undefinitely after the first ssh – raycons Oct 28 '20 at 13:50
  • The application is not supposed to quit. It's supposed to run on the other side until I kill it. That's why I made it non-interactive – raycons Oct 28 '20 at 13:52
  • 1
    Might be me misunderstanding what "non interactive" means too – raycons Oct 28 '20 at 13:54
  • they are supposed to run for around 6 minutes (basically the sleep time afterwards) – raycons Oct 28 '20 at 14:07
  • Maybe you want the `&` inside the quotes? – Paul Hodges Oct 28 '20 at 14:07
  • Didn't help, even tried inside and outside the quote – raycons Oct 28 '20 at 14:10
  • 1
    The relevant BashFaq for this is [BashFaq #89](https://mywiki.wooledge.org/BashFAQ/089) – that other guy Oct 28 '20 at 18:57
  • @thatotherguy Thanks! I was too into my own world when trying to figure this out so doing the correct searches never occured to me. After having played the drums for a few hours, I finally found the option. :) I'll leave my answer here even though there is a dupe since I tailor made it for OP and I added at least one other option that often messes up batch `ssh` jobs if missing (the `StrictHostKeyChecking=no` one). – Ted Lyngmo Oct 28 '20 at 19:07

2 Answers2

2

Here's a version that starts your background processes with a 1 second delay between each, waits 6 minutes before killing them one by one, with a 1 second delay between each, to give them approximately the same running time.

You should also add some options to ssh to prevent it from interfering with stdin and terminate your loop prematurely while running.

  • -n
    Prevents reading from stdin
  • -oBatchMode=yes
    Passphrase/password querying will be disabled
  • -oStrictHostKeyChecking=no
    Connect to host even if the host key has changed
#!/bin/bash

sshopts=(-n -oBatchMode=yes -oStrictHostKeyChecking=no)

j=0
pids=()
while IFS=$' \t\n' read -r ip quantity; do
  echo "${ip} x ${quantity}";

  for (( i = 0; i < quantity; ++i ))
  do
    remotecmd="cd test/libhotstuff && ./examples/hotstuff-app --conf ./hotstuff.gen-sec${j}.conf > log${j} 2>&1"
    localcmd=(ssh ${sshopts[@]} root@${ip} "$remotecmd")
    echo "${localcmd[@]}"
    "${localcmd[@]}" &
    # store the background pid
    pids+=($!)
    (( ++j ))
    sleep 1
  done

done < ips

seconds=360
echo "running ${pids[@]} in the background $seconds seconds"

sleep $seconds

echo "telling the background processes to terminate"
for pid in ${pids[@]}
do
    echo killing $pid
    kill $pid
    sleep 1
done

echo "waiting for all the background processes to terminate"
wait
echo Done
Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/223790/discussion-on-answer-by-ted-lyngmo-bash-read-line-stops-in-the-middle-of-the-fil). – Samuel Liew Oct 29 '20 at 00:30
1

Here is a version that offloads the loop and parallel processes to the remote shell script. Generate a remote shell script from a HereDocument with quantity, and wait for all the background processes to terminate before exiting.

#!/usr/bin/env sh

while IFS=$' \t\n\r' read -r ip quantity || [ -n "$quantity" ]
do
  {
# When satisfied by the output:
# Ucomment the line below and delete its following line with the echo and cat
#    ssh "root@$ip" <<EOF
    echo ssh "root@$ip"; cat <<EOF
if cd test/libhotstuff
then
  i=$quantity
  until
    i=\$((i - 1))
    [ \$i -lt 0 ]
  do
    ./examples/hotstuff-app \\
      --conf "./hotstuff.gen-sec\$i.conf" >"log\$i" 2>&1 &
  done
  wait
fi
EOF
  } &
done <ips

# Wait for all child processes to terminate
wait
echo "All child ssh done!"

Another way replacing the dynamic HereDocument by an inline shell script called with a quantity argument:

#!/usr/bin/env sh

while IFS=$' \t\n\r' read -r ip quantity || [ -n "$quantity" ]; do
  echo ssh "root@$ip" sh -c '
if cd test/libhotstuff
then
  i=0
  while [ $i -lt "$1" ]; do
    ./examples/hotstuff-app --conf "./hotstuff.gen-sec$i.conf" >"log$i" 2>&1 &
    i=$((i + 1))
  done
  wait
fi
' _ "$quantity" &
done <ips

# Wait for all child processes to terminate
wait
echo "All child ssh done!"
Léa Gris
  • 17,497
  • 4
  • 32
  • 41