Is it possible to exclude a redundant, invisible else branch with gcov?

Question

Imagine the following function:

int myFunction(my_enum_t x)
{
  int result = 0;

  if ((x != ENUM_VAL_A) &&
      (x != ENUM_VAL_B) &&
      (x != ENUM_VAL_C))
  {
    result = -1;
  }

  if (0 == result)
  {
     result = mutateSomething(x);
  }

  if (0 == result)
  {
    if (x == ENUM_VAL_A)
    {
       result = doOneThing();
    }

    else if (x == ENUM_VAL_B)
    {
       result = doThing2();
    }

    else if (x == ENUM_VAL_C)
    {
       result = doThing3();
    }
    /* Invisible else */
  }

  return result;
}

Whether in the interests of clarity or to avoid complicating mutateSomething(), I would like to keep the checks on x before that call, however, gcov will see the invisible else where I've commented. That branch cannot be covered. I am, for this reason, using the --exclude-unreachable-branches switch of gcovr, but this branch is still regarded as uncovered.

Is there any way to avoid this without removing the checks on x at the start of the function?

I sometimes wrap the whole final else if

else if (x == ENUM_VAL_C)
{...}

in exclusion tags, on the basis that I am allowed 2% uncovered code, but this is not possible in sources with one or two functions.

Before you think leaving off that `else` is a good idea, try adding this code: `x = 91231;`. `enum` values in C are *not* restricted to the values found in the `enum` definition. `enum` variables are just integers. — Andrew Henle, May 24 '23 at 19:42
@Andrew I think part of the point is that they check that `x` is valid at the start of the function. (They'd probably return if it wasn't but it looks like they are following a "one return per function" rule, hence the long chain of only doing something more if `result` is still zero.) — pmacfarlane, May 24 '23 at 19:56
Are `ENUM_VAL_A` etc 0, 1, 2 or are they something else? It matters for the answer, please post the enum typedef. — Lundin, May 25 '23 at 08:05
Short remark on gcovr's `--exclude-unreachable-branches` option: despite its name, it does not perform reachability analysis. It just ignores branch coverage data attributed to source code lines that look like they don't contain code. The [gcovr docs have some discussion on branch coverage](https://gcovr.com/en/stable/faq.html#why-does-c-code-have-so-many-uncovered-branches): “100% branch coverage will be impossible for most programs.” Sometimes, surgical use of `GCOV_EXCL_BR_LINE` markers can help. — amon, May 26 '23 at 08:33

chux - Reinstate Monica · Answer 1 · 2023-05-25T03:14:19.693

Candidate alternative:

else if (x == ENUM_VAL_C) --> else, as at this point x == ENUM_VAL_C is already known. This eliminates the unneeded test and branch.
Positive tests (== vs !=) tend to be easier to understand, especially as test complexity increases.
Initialization to 0 is never used. Consider int result = -1; to initialize result to an error value.

int myFunction(my_enum_t x) {
  int result = -1;
  // Is x valid?
  if ((x == ENUM_VAL_A) || (x == ENUM_VAL_B) || (x == ENUM_VAL_C)) {
    result = mutateSomething(x);
    if (result == 0) {
      if (x == ENUM_VAL_A) {
        result = doOneThing();
      } else if (x == ENUM_VAL_B) {
        result = doThing2();
      } else {
        result = doThing3();
      }
    }
  }
  return result;
}

pmacfarlane · Answer 2 · 2023-05-24T22:36:35.300

Since you have already verified that x is one of the three permissible values at the start of the function, you can change your final else if into an else, which means your logic ends with an explicit else. i.e.

    if (x == ENUM_VAL_A)
    {
       result = doOneThing();
    }
    else if (x == ENUM_VAL_B)
    {
       result = doThing2();
    }
    else // it must be ENUM_VAL_C
    {
       result = doThing3();
    }
    /* No invisible else */
  }

I find it a bit horrible, but when you're doing code coverage and unit-testing, sometimes you've got to do things like that. Specifically, any "belts and braces" and "this can't happen" code paths can't be tested. Hopefully your tests will catch the case if, in the future, you add ENUM_VAL_D.

Aside: some standards (such as MISRA) demand that an if...else if... chain ends with a final else. (MISRA-C-2012 rule 15.7). I can see the justification for it.

Lundin · Answer 3 · 2023-05-25T10:47:05.550

As someone who's used at working with safety-related code, MISRA C and similar, I see some flaws in the reasoning here:

"An enum can only have one of the values defined in the enumeration list" is false. It is just a glorified integer type.
"That branch cannot be covered" Except it can, in case the enum gets an unexpected value.

To find unexpected execution branches like this is one of the reasons to perform static analysis/code coverage to begin with.

Now why would this enum ever get any other value than ENUM_VAL_A, ENUM_VAL_B, ENUM_VAL_C? In embedded systems we might start to argue about somewhat unlikely things like electromagnetic interference, RAM memories being unreliable, cosmic rays etc. But far more likely, RAM variables will change values unexpectedly because of bugs. Pointer bugs, out of bounds access, runaway code, stack overflows and so on.

No matter what level of code quality and integrity you have, it is never wrong to include defensive programming. Meaning you add error checks to catch things that shouldn't be possible in theory. In this case by simply adding an else:

else if (x == ENUM_VAL_C)
{
   result = doThing3();
}
else
{
  // this is actually not unreachable code.
}

In your specific case it most likely makes most sense to write a return -1 inside this else. In other cases, it might be sufficient to just write else {}. The difference between an empty else after else if and no else at all is that the former means: "I have actually considered this unlikely scenario, but it will not affect the program". Whereas skipping the else just means "I have probably not considered this scenario at all". Adding such defensive programming checks/self-documenting code cost very little in terms of execution overhead.

As for how to write this whole thing in general... as it turns out, multiple returns will actually turn the code far more readable. As would using a dedicated enum for return codes instead of "magic numbers".

I would also drop 1980s "Yoda conditions" coding style (0 == something) and instead configure the compiler to always warn for assignment inside conditional expressions. With gcc that's -Wall or -Wparenthesis.

For an enum with adjacent, increasing number 0, 1, 2 etc we can also create a correspondence between an enumeration constant and a function. That is, ENUM_VAL_A should always correspond with doOneThing. One way to do this is to replace the long if-else chain with a function pointer jump table.

So given some enums

typedef enum
{
  ENUM_VAL_A,
  ENUM_VAL_B,
  ENUM_VAL_C,
  ENUM_N          // number of enumeration constants
} my_enum_t;

typedef enum      // explicit error codes over "int with magic numbers"
{
  ERR_NONE,
  ERR_WRONG_INPUT,
  ERR_UNEXPECTED,
} err_t;

Then we can completely overhaul the function like this:

err_t myFunction (my_enum_t x)
{
  if ((x != ENUM_VAL_A) &&
      (x != ENUM_VAL_B) &&
      (x != ENUM_VAL_C))
  {
    return ERR_WRONG_INPUT;
  }

  my_enum_t result = mutateSomething(x);
  if ((result != ENUM_VAL_A) &&
      (result != ENUM_VAL_B) &&
      (result != ENUM_VAL_C))
  {
    return ERR_UNEXPECTED;
  }
  /* now we have truly verified that the enum isn't invalid or corrupt */

  err_t (*const jumptable[ENUM_N]) (void) = 
  { 
    [ENUM_VAL_A] = doOneThing, 
    [ENUM_VAL_B] = doThing2, 
    [ENUM_VAL_C] = doThing3, 
  };
  
  return jumptable[result];
}

As for code coverage, it might get a little lost when we introduce function pointers. But in terms of "cyclomatic complexity" (number of possible execution paths), then this code is much simpler since there are no (nested) if-else chains or switch.

@pmacfarlane Oh right, that's a copy/paste typo it should say `result`. Fixed, thanks. — Lundin, May 25 '23 at 10:46
I think in the OP, `mutateSomething()` returns an error status, rather than a `my_enum_t`. Maybe should be `if (result != ERR_NONE) return result;` or similar? — pmacfarlane, May 25 '23 at 10:53
`if ((result != ENUM_VAL_A) && (result != ENUM_VAL_B) && (result != ENUM_VAL_C)) {` is unclear as `result` and `ENUM_VAL_...` are not certainly in the same domain. — chux - Reinstate Monica, May 25 '23 at 14:34
@chux-ReinstateMonica Not sure what you mean that they aren't "in the same domain"? I am assuming that after the rewrite, `mutateSomething` returns `my_enum_t` and not some magic number. — Lundin, May 25 '23 at 14:46
OP's `mutateSomething()` returns a _result_ like `myFunction(), doOneThing(), doThing2(), doThing3()`. To match OP's intent, `err_t result = mutateSomething(x);` is more consistent than `my_enum_t result = mutateSomething(x);`. — chux - Reinstate Monica, May 25 '23 at 18:45

Is it possible to exclude a redundant, invisible else branch with gcov?

3 Answers3