What is a good Linux exit error code strategy?

Question

I have several independent executable Perl, PHP CLI scripts and C++ programs for which I need to develop an exit error code strategy. These programs are called by other programs using a wrapper class I created to use exec() in PHP. So, I will be able to get an error code back. Based on that error code, the calling script will need to do something.

I have done a little bit of research and it seems like anything in the 1-254 (or maybe just 1-127) range could be fair game to user-defined error codes.

I was just wondering how other people have approached error handling in this situation.

score 8 · Accepted Answer · answered Dec 26 '08 at 14:48

The only convention is that you return 0 for success, and something other than zero for an error. Most well-known unix programs document the various return codes that they can return, and so should you. It doesn't make a lot of sense to try to make a common list for all possible error codes that any arbitrary program could return, or else you end up with tens of thousands of them like some other OS's, and even then, it doesn't always cover the specific type of error you want to return.

So just be consistent, and be sure to document whatever scheme you decide to use.

score 6 · Answer 2 · answered Dec 26 '08 at 14:49

6

1-127 is the available range. Anything over 127 is supposed to be "abnormal" exit - terminated by a signal.

While you're at it, consider using stdout rather than exit code. Exit code is by tradition used to indicate success, failure, and may be one other state. Rather than using exit code, try using stdout the way expr and wc use it. You can then use backtick or something similar in the caller to extract the result.

answered Dec 26 '08 at 14:49

2

To be clear, it's always possible for a parent process to tell whether it was a signal or a normal exit... it's the _shell_ that maps signals to 129-254 for `$?`. – Random832 Apr 17 '11 at 19:15

score 4 · Answer 3 · answered Dec 26 '08 at 15:42

4

the unix manifesto states -

Exit as soon and as loud as possible on error

or something like that

answered Dec 26 '08 at 15:42

score 3 · Answer 4 · answered Dec 26 '08 at 16:57

Don't try to encode too much meaning into the exit value: detailed statuses and error reports should go to stdout / stderr as Arkadiy suggests.

However, I have found it very useful to represent just a handful of states in the exit values, using binary digits to encode them. For example, suppose you have the following contrived meanings:

0000 : 0 (no error)
0001 : 1 (error)
0010 : 2 (I/O error)
0100 : 4 (user input error)
1000 : 8 (permission error)

Then, a user input error would have a return value of 5 (4 + 1), while a log file not having write permission might have a return value of 11 (8 + 2 + 1). As the different meanings are independently encoded in the return value, you can easily see what's happened by checking which bits are set.

As a special case, to see if there was an error you can AND the return code with 1.

By doing this, you can encode a couple of different things in the return code, in a clear and simple way. I use this only to make simple decisions such as "should the process be restarted", "do the return value and relevant logs need to be sent to an admin", that sort of thing. Any detailed diagnostic information should go to logs or to stdout / stderr.

score 1 · Answer 5 · edited May 23 '17 at 11:53

The normal exit statuses run from 0 to 255 (see Exit codes bigger than 255 posssible for a discussion of why). Normally, status 0 indicates success; anything else is an implementation-defined error. I do know of a program that reports the state of a DBMS server via the exit status; that is a special case of implementation-defined exit statuses. Note that you get to define the implementation of the statuses of your programs.

I couldn't fit this into 300 characters; otherwise it would have been a comment to @Arkadiy's answer.

Arkadiy is right that in one part of the exit status word, values other than zero indicate the signal that terminated the process and the 8th bit normally indicates a core dump, but that section of the exit status is different from the main 0..255 status. However, the shell (whichever shell it is) is presented with a problem when a process dies as a result of a signal. There is 16 bits of data to be presented in an 8-bit value, which is always tricky. What the shells seem to do is to take the signal number and add 128 to it. So, if a process dies as a result of an interrupt (signal number 2, SIGINT), the shell reports the exit status as 130. However, the kernel reported the status as 0x0002; the shell has modified what the kernel reports.

The following C code demonstrates this. There are two programs

suicide which kills itself using a signal of your choosing (interrupt by default).
exitstatus which runs a command (such as suicide) and reports the kernel exit status.

Here's suicide.c:

/*
@(#)File:           $RCSfile: suicide.c,v $
@(#)Version:        $Revision: 1.2 $
@(#)Last changed:   $Date: 2008/12/28 03:45:18 $
@(#)Purpose:        Commit suicide using kill()
@(#)Author:         J Leffler
@(#)Copyright:      (C) JLSS 2008
@(#)Product:        :PRODUCT:
*/

/*TABSTOP=4*/

#if __STDC_VERSION__ >= 199901L
#define _XOPEN_SOURCE 600
#else
#define _XOPEN_SOURCE 500
#endif /* __STDC_VERSION__ */

#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include "stderr.h"

static const char usestr[] = "[-V][-s signal]";

#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
extern const char jlss_id_suicide_c[];
const char jlss_id_suicide_c[] = "@(#)$Id: suicide.c,v 1.2 2008/12/28 03:45:18 jleffler Exp $";
#endif /* lint */

int main(int argc, char **argv)
{
    int signum = SIGINT;
    int opt;
    char *end;

    err_setarg0(argv[0]);

    while ((opt = getopt(argc, argv, "Vs:")) != -1)
    {
        switch (opt)
        {
        case 's':
            signum = strtol(optarg, &end, 0);
            if (*end != '\0' || signum <= 0)
                err_error("invalid signal number %s\n", optarg);
            break;
        case 'V':
            err_version("SUICIDE", &"@(#)$Revision: 1.2 $ ($Date: 2008/12/28 03:45:18 $)"[4]);
            break;
        default:
            err_usage(usestr);
            break;
        }
    }
    if (optind != argc)
        err_usage(usestr);
    kill(getpid(), signum);
    return(0);
}

And here's exitstatus.c:

/*
@(#)File:           $RCSfile: exitstatus.c,v $
@(#)Version:        $Revision: 1.2 $
@(#)Last changed:   $Date: 2008/12/28 03:45:18 $
@(#)Purpose:        Run command and report 16-bit exit status
@(#)Author:         J Leffler
@(#)Copyright:      (C) JLSS 2008
@(#)Product:        :PRODUCT:
*/

/*TABSTOP=4*/

#if __STDC_VERSION__ >= 199901L
#define _XOPEN_SOURCE 600
#else
#define _XOPEN_SOURCE 500
#endif /* __STDC_VERSION__ */

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include "stderr.h"

#ifndef lint
/* Prevent over-aggressive optimizers from eliminating ID string */
extern const char jlss_id_exitstatus_c[];
const char jlss_id_exitstatus_c[] = "@(#)$Id: exitstatus.c,v 1.2 2008/12/28 03:45:18 jleffler Exp $";
#endif /* lint */

int main(int argc, char **argv)
{
    pid_t pid;

    err_setarg0(argv[0]);

    if (argc < 2)
        err_usage("cmd [args...]");

    if ((pid = fork()) < 0)
        err_syserr("fork() failed: ");
    else if (pid == 0)
    {
        /* Child */
        execvp(argv[1], &argv[1]);
        return(1);
    }
    else
    {
        pid_t corpse;
        int status;
        corpse = waitpid(pid, &status, 0);
        if (corpse != pid)
            err_syserr("waitpid() failed: ");
        printf("0x%04X\n", status);
    }
    return(0);
}

The missing code, stderr.c and stderr.h, can easily be found in essentially any of my published programs. If you need it urgently, get it from the program SQLCMD at the IIUG Software Archive; alternatively, contact me by email (see my profile).

What is a good Linux exit error code strategy?

5 Answers5

Linked