Python subprocess.run exit codes not working with bash script

Question

I use Python to call a Bash script. I use the run() function for that which was introduced in Python 3.5.

I want to use the returncode for something, so I use this:

result = subprocess.run(["./app/first_deployment.sh", arg], stdout=subprocess.PIPE,)

if result.returncode == 0:
    # do something

My Bash file:

# First condition
if grep -q 'string' file.txt
then
    # Second condition
    if grep -q 'anotherstring' file.txt
    then
        echo "Success"
        exit 0
    else
        echo "Fail message 2"
        exit 1
    fi
else
    echo "Fail message 1"
    exit 1
fi

So it seems to work, the correct messages I do see in the logs. However result.returncode ALWAYS is code 0, which means succesfull. Why is that and how can I make sure it works?

Update (full script):

#!/bin/bash
basedir="/home/dpa/clients"
user=$1
archive_url=$2
repo_name=$3
port=$4
deployment_tag=$5

mkdir $basedir/$user
mkdir $basedir/$user/$repo_name
curl -o $basedir/$user/$repo_name/$deployment_tag.tar.gz $archive_url
mkdir $basedir/$user/$repo_name/$deployment_tag
tar -xvf $basedir/$user/$repo_name/$deployment_tag.tar.gz -C $basedir/$user/$repo_name/$deployment_tag --strip-components 1
rm -rf $basedir/$user/$repo_name/$deployment_tag.tar.gz

# Check if a production.yml file exists in the new directory
if [ -f "$basedir/$user/$repo_name/$deployment_tag/production.yml" ]
then
    # Check for the websecure endpoint
    if grep -q 'traefik.http.routers.$$$UNIQUE_DEPLOYMENT_TAG-secure.entrypoints=websecure' $basedir/$user/$repo_name/$deployment_tag/production.yml
    then
        # Check for the host rule
        if grep -q 'traefik.http.routers.$$$UNIQUE_DEPLOYMENT_TAG-secure.rule=Host' $basedir/$user/$repo_name/$deployment_tag/production.yml
        then
            # Check if the proxy network exists
            if grep -q 'network=proxy' $basedir/$user/$repo_name/$deployment_tag/production.yml
            then
                sed -i "s/\$\$\$PORT/${port}/g" $basedir/$user/$repo_name/$deployment_tag/production.yml
                sed -i "s/\$\$\$UNIQUE_DEPLOYMENT_TAG/${deployment_tag}/g" $basedir/$user/$repo_name/$deployment_tag/production.yml
                # docker-compose -f $basedir/$user/$repo_name/$deployment_tag/production.yml build
                # docker-compose -f $basedir/$user/$repo_name/$deployment_tag/production.yml up -d
                
                echo "Deployment succesfull! Your app is online :)"
                exit 0
            else
                echo "Proxy network rule not found in yml config."
                exit 1
            fi
        else
            echo "Traefik host rule not found in yml config."
            exit 1
        fi
    else
        echo "Traefik websecure endpoint not found in yml config."
        set -x
        exit 1
    fi
else
    echo "No production.yml could be found. Please follow the docs and include the correct YAML file."
    exit 1
fi

UPDATE2 (output):

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1845  100  1845    0     0   7561      0 --:--:-- --:--:-- --:--:--  7530
+ mkdir /home/dpa/clients/foo/testapi1/testapi1-305983855
+ tar -xvf /home/dpa/clients/foo/testapi1/testapi1-305983855.tar.gz -C /home/dpa/clients/foo/testapi1/testapi1-305983855 --strip-components 1
+ rm -rf /home/dpa/clients/foo/testapi1/testapi1-305983855.tar.gz
+ '[' -f /home/dpa/clients/foo/testapi1/testapi1-305983855/production.yml ']'
+ grep -q 'traefik.http.routers.$$$UNIQUE_DEPLOYMENT_TAG-secure.entrypoints=websecure' /home/dpa/clients/foo/testapi1/testapi1-305983855/production.yml
+ echo 'Traefik websecure endpoint not found in yml config.'
+ exit 1
CompletedProcess(args=['./app/first_deployment.sh', 'foo', 'https://codeload.github.com/foo/testapi1/legacy.tar.gz/master?token=changedthis', 'testapi1', '7039', 'testapi1-305983855'], returncode=0, stdout=b'foo-testapi1-b1c9fd5be165b850d2b94cde30affa622b5c3621/.github/\nfoo-testapi1-b1c9fd5be165b850d2b94cde30affa622b5c3621/.github/dependabot.yml\nfoo-testapi1-b1c9fd5be165b850d2b94cde30affa622b5c3621/.github/workflows/\nfoo-testapi1-b1c9fd5be165b850d2b94cde30affa622b5c3621/.github/workflows/ci.yml\nfoo-testapi1-b1c9fd5be165b850d2b94cde30affa622b5c3621/.gitignore\nfoo-testapi1-b1c9fd5be165b850d2b94cde30affa622b5c3621/.vscode/\nfoo-testapi1-b1c9fd5be165b850d2b94cde30affa622b5c3621/.vscode/settings.json\nfoo-testapi1-b1c9fd5be165b850d2b94cde30affa622b5c3621/docker-compose.yml\nfoo-testapi1-b1c9fd5be165b850d2b94cde30affa622b5c3621/local.yml\nfoo-testapi1-b1c9fd5be165b850d2b94cde30affa622b5c3621/production.yml\nfoo-testapi1-b1c9fd5be165b850d2b94cde30affa622b5c3621/testapi1/\nfoo-testapi1-b1c9fd5be165b850d2b94cde30affa622b5c3621/testapi1/.dockerignore\nfoo-testapi1-b1c9fd5be165b850d2b94cde30affa622b5c3621/testapi1/Dockerfile\nfoo-testapi1-b1c9fd5be165b850d2b94cde30affa622b5c3621/testapi1/app/\nfoo-testapi1-b1c9fd5be165b850d2b94cde30affa622b5c3621/testapi1/app/__init__.py\nfoo-testapi1-b1c9fd5be165b850d2b94cde30affa622b5c3621/testapi1/app/main.py\nfoo-testapi1-b1c9fd5be165b850d2b94cde30affa622b5c3621/testapi1/requirements.txt\nTraefik websecure endpoint not found in yml config.\n')

Update 3:

So after following the advice of simplying the script, I tried the bash script file with just the config below. And this gave me.. exit code 1! As expected. Great that seems to work.

#!/bin/bash
basedir="/home/dpa/clients"
user=$1
archive_url=$2
repo_name=$3
port=$4
deployment_tag=$5

# # Check if a production.yml file exists in the new directory
if [ -f "$basedir/$user/$repo_name/$deployment_tag/production.yml" ]
then
    echo "Test complete"
    exit 0

else
    echo "No production.yml could be found. Please follow the docs and include the correct YAML file."
    exit 1
fi

Now when I added just the line mkdir before that. It ended up in a exit code 0. Which is weird because it should give me an exit code 1, since the directory does not exist. So the following code:

#!/bin/bash
basedir="/home/dpa/clients"
user=$1
archive_url=$2
repo_name=$3
port=$4
deployment_tag=$5

mkdir "$basedir/$user"

# # Check if a production.yml file exists in the new directory
if [ -f "$basedir/$user/$repo_name/$deployment_tag/production.yml" ]
then
    echo "Test complete"
    exit 0

else
    echo "No production.yml could be found. Please follow the docs and include the correct YAML file."
    exit 1
fi

I tried this with commands like cd or ls as well, all ending up in the same result, an exit code 0. So for some reason whenever a shell command is ran succesfully, it results in exit code 0 in the Python function. Because a file check only did work. So it must be a Python related problem..

...to be clear, that shouldn't happen -- the least-surprise interpretation is to doubt that your file is sufficiently close to your real program logic, and suspect that the actual implementation has an exit trap or other logic modifying the exit status. `set -x` logs should include any exit trap in use. — Charles Duffy, Oct 23 '20 at 15:15
BTW, don't use `.sh` extensions on script executables; executables don't have extension on UNIX systems. You don't run `ls.elf`, or `pip.py`, you just run `ls` or `pip`; and a `.sh` extension is misleading for a _bash_ script, as bash is a different language from POSIX sh (in much the same way that C++ is a different language from C; running `sh mybashscript.sh` is a quick way to get bugs, even when `sh` is symlinked to bash, as it turns off some bash-only features when called under that name). — Charles Duffy, Oct 23 '20 at 15:16
...alternately, if you run `./app/first_deployment.sh "$arg"`; echo "Exit status: $?"` in a shell, what exit status does that show? Always good to try to isolate which component is at fault. — Charles Duffy, Oct 23 '20 at 15:18
Hmmm so based on your last comment, if I run it directly from the shell it does give me exit code 1 as expected.. So that's weird. Python seems to always give me an exit code 0.. — Raf Rasenberg, Oct 23 '20 at 15:38
And just to sanity-check, `subprocess.run(['false']).returncode` gives you `1`, right? Assuming that's the case (I can't imagine that it wouldn't be)... have you reproduced the bug with the _exact_ code given in your question as the script? (I do notice that that script doesn't include a shebang; programs can't be reliably executed without one -- if you're running something _from a shell_ it'll default to starting another copy of itself as the baseline interpreter, but the kernel's `execve` call, which `subprocess.Popen` eventually backends into, doesn't do that). — Charles Duffy, Oct 23 '20 at 15:51
`subprocess.run(['false']).returncode` gives 1 indeed. I did simplify the code tho.. Just to make it short for this question. I can update it and post the full code, maybe that helps. I'll update my question. — Raf Rasenberg, Oct 23 '20 at 16:07
Make sure that the problem still happens with your simplified code, and if it doesn't, then add more complexity until it _does_ start happening. We still want a [mre], not your real code; but to be "reproducible", it needs to actually cause the problem, just as to be "minimal", it needs to be the _shortest thing_ that actually causes the problem. — Charles Duffy, Oct 23 '20 at 16:10
BTW, I see you added `set -x` immediately before an `exit`; put it up at the very top of the script to trace that whole script's execution (and maybe set `PS4=':$LINENO+'` to make that trace include line numbers). — Charles Duffy, Oct 23 '20 at 16:11
Not related to your immediate question, but there are also rather a lot of quoting bugs in that script; http://shellcheck.net/ will point them out for you automatically. Going to have a bad time with filenames containing spaces without those fixed -- and even if you don't ever expect to have such filenames, quoted expansions behave more predictably, so using them makes code easier to analyze. — Charles Duffy, Oct 23 '20 at 16:12
As an aside, don't run `sed -i` twice on the same file. You can eminently well pass in multiple lines of script in one `sed` invocation. See https://stackoverflow.com/questions/7657647/combining-two-sed-commands — tripleee, Oct 23 '20 at 16:20
I added the output now too, as you can see the script itself is giving a code 1. But if you check the `CompletedProcess` it has `returncode` 0 — Raf Rasenberg, Oct 23 '20 at 16:24
Interesting. None of the usual mistakes someone can make to mask the exit status they want to return are present here, and since you aren't using `shell=True`, there's none of the added complexity of having a shell in the way. I'd want to reproduce this on hardware I can control, and where I can use tools like sysdig, strace, &c. to look at what's happening under-the-hood. — Charles Duffy, Oct 23 '20 at 16:36
(...but to do that, I'll want a reproducer I can run myself; something that's referring to a bunch of paths I don't have doesn't fit). — Charles Duffy, Oct 23 '20 at 16:43
...so, I'd suggest simplifying the script as much as you can while still getting the problem. Does it still reproduce when the shell fragment being run contains nothing but `#!/bin/bash` and `exit 1`? — Charles Duffy, Oct 23 '20 at 18:00
@CharlesDuffy Thank you so much for the help! I just tried debugging it again, view the new update in the question. — Raf Rasenberg, Oct 24 '20 at 03:18

Python subprocess.run exit codes not working with bash script

0 Answers0