Bash script hangs when calling function

Question

I am currently testing a bash script to perform database migration.

The scripts basically accepts some parameters such as:

the name of the database to migrate
the server from which migrate it
the server to which migrate it

In the script there is a function which "builds" the mysql and mysqldump commands to be executed, depending on whether the server from/to is local/remote.

the function build_mysql_command is then used like this:

_query="$(build_mysql_command mysql from)"
_dump="$(build_mysql_command mysqldump from)"
_restore="$(build_mysql_command mysql to)"

However, when the function build_mysql_command has to call open_ssh_tunnel, it hangs on the last instruction, as I have tested by using the script with the -x switch.

If instead I put the SSH tunnel opening outside build_mysql_command, and remove the call from there, it works.

However, I do not think I made any mistake in the above functions, so I do not understand why the script would hang.

Here is a very stripped down example which shows the problem, where I replaced the actual IP address of the remote server with 1.2.3.4:

#!/bin/bash
set -x
set -o pipefail

# $1 = 'from' or 'to'
get_local_port() {
    case "$1" in
        from)
            echo 30303
        ;;

        to)
            echo 31313
        ;;

        *)
            echo 0
        ;;
    esac
}

# $1 = 'from' or 'to'
build_ssh_command() {
    local _ssh="ssh -oUserKnownHostsFile=/dev/null -oStrictHostKeyChecking=no"

    if [ ! -z "${params[password-$1]}" ] ; then
        _ssh="sshpass -f ${params[password-$1]} $_ssh"
    fi

    echo "$_ssh"
}

# $1 = 'from' or 'to'
open_ssh_tunnel() {
    # se non già aperto
    if [ -z "${cleanup[ssh-tunnel-$1]}" ] ; then
        local _port="$(get_local_port "$1")"
        local _ssh="$(build_ssh_command "$1")"
        local _pregp="fnNTL $_port:localhost:3306 ${params[migrate-$1]}"
        local _command="$_ssh -$_pregp"

        # tento apertura tunnel SSH
        if ! $_command ; then
            return 1
        else
            # salvo PID del tunnel così aperto
            local _pid="$(pgrep -f "$_pregp" 2> /dev/null)"

            if [ -z "$_pid" ] ; then
                return 1
            fi

            cleanup["ssh-tunnel-$1"]="$_pid"
        fi
    fi

    return 0
}

# verifica se un indirizzo fa riferimento alla macchina locale
# $1 = indirizzo da verificare
is_local_address() {
    local _host="$(hostname)"

    case "$1" in
        localhost|"127.0.0.1"|"$_host")
            return 0
        ;;

        *)
            return 1
        ;;
    esac
}

# costruisce un comando di dump o restore MySQL
# $1 = comando di base
# $2 = tipo server ('from' o 'to')
build_mysql_command() {
    local _command="$1 --user=root --password=xxx"

    if is_local_address "${params[migrate-$2]}" ; then
        # connessione tramite socket
        _command="$_command --protocol=socket --socket=/opt/agews64/data/mysql/mysql.sock"
    elif open_ssh_tunnel "$2" ; then
        # altrimenti uso connessione tramite tunnel SSH
        _command="$_command --protocol=tcp --host=localhost --port=$(get_local_port "$2")"
    else
        _command=""
    fi

    echo "$_command"
}

# parametri di esecuzione dello script
declare -A params=(
    ["migrate-from"]="localhost"
    ["migrate-to"]="1.2.3.4"
)

_query="$(build_mysql_command "mysql" "from")"
echo "_query = $_query"

_dump="$(build_mysql_command "mysqldump" "to")"
echo "_dump = $_dump"

# fine test

and here is the output when run:

+ set -o pipefail
+ params=(["migrate-from"]="localhost" ["migrate-to"]="1.2.3.4")
+ declare -A params
++ build_mysql_command mysql from
++ local '_command=mysql --user=root --password=xxx'
++ is_local_address localhost
+++ hostname
++ local _host=my.host.name
++ case "$1" in
++ return 0
++ _command='mysql --user=root --password=xxx --protocol=socket --socket=/opt/agews64/data/mysql/mysql.sock'
++ echo 'mysql --user=root --password=xxx --protocol=socket --socket=/opt/agews64/data/mysql/mysql.sock'
+ _query='mysql --user=root --password=xxx --protocol=socket --socket=/opt/agews64/data/mysql/mysql.sock'
+ echo '_query = mysql --user=root --password=xxx --protocol=socket --socket=/opt/agews64/data/mysql/mysql.sock'
_query = mysql --user=root --password=xxx --protocol=socket --socket=/opt/agews64/data/mysql/mysql.sock
++ build_mysql_command mysqldump to
++ local '_command=mysqldump --user=root --password=xxx'
++ is_local_address 1.2.3.4
+++ hostname
++ local _host=asp10.626suite-online.it
++ case "$1" in
++ return 1
++ open_ssh_tunnel to
++ '[' -z '' ']'
+++ get_local_port to
+++ case "$1" in
+++ echo 31313
++ local _port=31313
+++ build_ssh_command to
+++ local '_ssh=ssh -oUserKnownHostsFile=/dev/null -oStrictHostKeyChecking=no'
+++ '[' '!' -z '' ']'
+++ echo 'ssh -oUserKnownHostsFile=/dev/null -oStrictHostKeyChecking=no'
++ local '_ssh=ssh -oUserKnownHostsFile=/dev/null -oStrictHostKeyChecking=no'
++ local '_pregp=fnNTL 31313:localhost:3306 1.2.3.4'
++ local '_command=ssh -oUserKnownHostsFile=/dev/null -oStrictHostKeyChecking=no -fnNTL 31313:localhost:3306 1.2.3.4'
++ ssh -oUserKnownHostsFile=/dev/null -oStrictHostKeyChecking=no -fnNTL 31313:localhost:3306 1.2.3.4
Warning: Permanently added '1.2.3.4' (ECDSA) to the list of known hosts.
+++ pgrep -f 'fnNTL 31313:localhost:3306 1.2.3.4'
++ local _pid=8919
++ '[' -z 8919 ']'
++ cleanup["ssh-tunnel-$1"]=8919
++ return 0
+++ get_local_port to
+++ case "$1" in
+++ echo 31313
++ _command='mysqldump --user=root --password=xxx --protocol=tcp --host=localhost --port=31313'
++ echo 'mysqldump --user=root --password=xxx --protocol=tcp --host=localhost --port=31313'

As you can see, the script hangs at the very last line of build_mysql_command when it has opened the SSH tunnel to the remote server, but shows no problem when it builds the local command.

Unrelated to your problem, but don't build commands as strings; it's innately prone to failure, and you'll hit the limitations eventually. See [BashFAQ #50](http://mywiki.wooledge.org/BashFAQ/050). — Charles Duffy, May 29 '19 at 15:22
...getting towards the immediate problem itself, it shouldn't be a surprise that running a ssh command blocks until that command exits (and that's if it isn't blocking to read stdin, before even getting to that point). To the extent that it *is* surprising, we'd want to see what that command *actually is*, with full list of arguments; providing the tail of your `set -x` trace would be a place to start. — Charles Duffy, May 29 '19 at 15:26
The last instruction in the function is `return 0`, is that where it appears to hang? I agree, seeing the trace would be helpful. — Barmar, May 29 '19 at 15:31
Did pasting your whole script into https://shellcheck.net reveal any problems? Be sure to include a proper "she-bang" line at the top, usually `#!/bin/bash`. AND are you saying that when you run the `ssh` command from the cmd-line (separate from the script), that it completes immediately? Good luck. — shellter, May 29 '19 at 19:26
I didn't know that site, I'll try it tomorrow. About the ssh command, yes, I used the switch that causes it to go in background. — Matteo Tassinari, May 29 '19 at 19:39
To reiterate above, best to include the `set -x` output for the affected stuff. Also good idea to make this the fewest lines of code that still demonstrate the problem. Good luck. — shellter, May 29 '19 at 20:27
@shellter I have added a stripped down version of the script which still shows the problem, and its output with `set +x` — Matteo Tassinari, Jun 01 '19 at 14:46
I'm sorry but I am not sure what you are trying to tell me, build_mysql_command is a different function from build_ssh_command, and the first calls the second, the output looks correct. — Matteo Tassinari, Jun 01 '19 at 17:21
OK, sorry, rushed into it ;-/ . I'll try to look at your code before Monday, but can't promise it. Good luck. — shellter, Jun 01 '19 at 18:01
Don't worry, I really appreciate your help. I have also found a workaround which works, that is, to open the SSH tunnels in the main script outside the functions, but I'd really like to understand why the original version doesn't work. — Matteo Tassinari, Jun 01 '19 at 18:05
Don't have time right now to really analyize this, but I thought of this answer that might be your problem : https://stackoverflow.com/questions/5185717/spawn-subshell-for-ssh-and-continue-with-program-flow/5199505 . Good luck — shellter, Jun 01 '19 at 19:00
Hi, I ran this in my environment (I had to convert to `ksh`, changing `local` to `typeset` (which makes a variable local inside of functions) and changed `declare` to `typeset` as well (just how declaring arrays work in `ksh`). Of course I don't have access to the same network environment as you did, but my run (after fixing the syntax stuff) resulted in `ssh: connect to host 1.2.3.4 port 22: Connection timed out`, after waiting ~5 secs, **the script finished** (didn't hang). Can't remember, to test, did you just copy/paste that command to a terminal and execute, does it still hang? Good luck. — shellter, Jun 02 '19 at 18:07
I'm sorry if I didn't specify it explicitly, 1.2.3.4 is just a placeholder which should be replaced for an existing IP address to which you have SSH access. — Matteo Tassinari, Jun 02 '19 at 18:16
yes, well if it hangs when I use a valid IP, it is something about the command being executed remotely. Can you replace the remote command with something simple like `date` and does it still hang? — shellter, Jun 02 '19 at 19:34
It does *not* execute any command, it just creates a remote port forwarding and then should drop to background, it is not supposed to execute a command. — Matteo Tassinari, Jun 02 '19 at 19:37
how do you create it to drop into the background? I'd expect to see `&` at the end of the remote execution string. Is one of the arguments to `ssh` specify background? Will stay a few minutes, maybe we can wrap this up ;-) — shellter, Jun 02 '19 at 19:40
The SSH command is called with the flags -fnNTL, and one of these is the one which sets it to drop to background, I do not remember which one exactly right now. — Matteo Tassinari, Jun 02 '19 at 19:41
yes, just checked. `-f` AND, I'll ask again, what happens when you paste the correct command into a terminal, does it return, or hang? — shellter, Jun 02 '19 at 19:44
Anf from `man ssh` *-n' Redirects stdin from /dev/null (actually, prevents reading from stdin). This must be used when ssh is run in the background. A common trick is to use this to run X11 programs on a remote machine. For example, **ssh -n shadows.cs.hut.fi emacs &** will start an emacs on shadows.cs.hut.fi, and the X11 connection will be automatically forwarded over an encrypted channel. The ssh program will be put in the background. (This does not work if ssh needs to ask for a password or passphrase; see also the -f option.)* Are the `-f` and `-N` without a `&` conflicting? — shellter, Jun 02 '19 at 19:48
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/194344/discussion-between-shellter-and-matteo-tassinari). — shellter, Jun 02 '19 at 20:57

Bash script hangs when calling function

0 Answers0