Why does running "pkill -f " over ssh fail only when branching on its result?

Question

Found an interesting interaction between pkill and ssh. Documenting it here for posterity:

$ ssh user@remote 'false'; echo $?                                                              
1

$ ssh user@remote 'false || echo "failed"'; echo $?
failed
0

$ ssh user@remote 'pkill -f "fake_process"'; echo $?                                               
1

$ ssh user@remote 'pkill -f "fake_process" || echo "failed"'; echo $?
255

It seems like example #4 should have the same output as #2; both false and pkill -f "fake_process" exit with code 1 and have no output. However, #4 will always exit with code 255, even if the remote command explicitly calls exit 0. The docs for ssh state that code 255 just means "an error occurred" (super helpful).

Replacing the pkill command with (exit 1), ls fake_file, kill <non-existent PID>, etc. all work as expected. Additionally, when running locally (not through ssh), these match as expected.

score 1 · Accepted Answer · answered Apr 08 '20 at 19:54

The problem appears to be that pkill is killing itself. Or rather, it is killing the shell that owns it.

First of all, it appears that ssh uses the remote user's shell to execute certain "complicated" commands:

$ ssh user@remote 'ps -F --pid $$'
UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
user      9531  9526  0 11862  1616   6 14:36 ?        00:00:00 ps -F --pid 9531

$ ssh user@remote 'ps -F --pid $$ && echo hi'
UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
user      9581  9577  0 28316  1588   5 14:36 ?        00:00:00 bash -c ps -F --pid $$ && echo hi
hi

Second, it appears that pkill -f normally knows not to kill itself (otherwise all pkill -f commands would suicide). But if run from a subshell, that logic fails:

$ pkill -f fake_process; echo $?
1

$ sh -c 'pkill -f fake_process'; echo $?
[1]    14031 terminated  sh -c 'pkill -f fake_process'
143

In my case, to fix this I just re-worked some of the code around my ssh/pkill so that I could avoid having a "complicated" remote command. Theoretically I think you could also do something like pgrep -f <cmd> | grep -v $$ | xargs kill.

ssh uses the remote shell to execute **all** commands. There exists no case when the remote shell is not used (though there are some cases where the shell may just `exec` another command and not remain in memory; you've spotted one of those, but that `ps` command is *first* a `$SHELL -c 'ps ...'` command, and just gets replaced with a `ps` process by bash after it determines that there's nothing it needs to do after `ps` finishes executing, and thus can `exec` with no prior `fork`). — Charles Duffy, Apr 08 '20 at 19:56
...whether a given shell implements that optimization is implementation-defined, so one doesn't want to depend on whether it exists. — Charles Duffy, Apr 08 '20 at 19:58

Why does running "pkill -f " over ssh fail only when branching on its result?

1 Answers1