What is the branch in the destructor reported by gcov?

Question

When I use gcov to measure test coverage of C++ code it reports branches in destructors.

struct Foo
{
    virtual ~Foo()
    {
    }
};

int main (int argc, char* argv[])
{
    Foo f;
}

When I run gcov with branch probabilities enabled (-b) I get the following output.

$ gcov /home/epronk/src/lcov-1.9/example/example.gcda -o /home/epronk/src/lcov-1.9/example -b
File 'example.cpp'
Lines executed:100.00% of 6
Branches executed:100.00% of 2
Taken at least once:50.00% of 2
Calls executed:40.00% of 5
example.cpp:creating 'example.cpp.gcov'

The part that bothers me is the "Taken at least once:50.00% of 2".

The generated .gcov file gives more detail.

$ cat example.cpp.gcov | c++filt
        -:    0:Source:example.cpp
        -:    0:Graph:/home/epronk/src/lcov-1.9/example/example.gcno
        -:    0:Data:/home/epronk/src/lcov-1.9/example/example.gcda
        -:    0:Runs:1
        -:    0:Programs:1
        -:    1:struct Foo
function Foo::Foo() called 1 returned 100% blocks executed 100%
        1:    2:{
function Foo::~Foo() called 1 returned 100% blocks executed 75%
function Foo::~Foo() called 0 returned 0% blocks executed 0%
        1:    3:    virtual ~Foo()
        1:    4:    {
        1:    5:    }
branch  0 taken 0% (fallthrough)
branch  1 taken 100%
call    2 never executed
call    3 never executed
call    4 never executed
        -:    6:};
        -:    7:
function main called 1 returned 100% blocks executed 100%
        1:    8:int main (int argc, char* argv[])
        -:    9:{
        1:   10:    Foo f;
call    0 returned 100%
call    1 returned 100%
        -:   11:}

Notice the line "branch 0 taken 0% (fallthrough)".

What causes this branch and what do I need to do in the code to get a 100% here?

g++ (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2
gcov (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2

See my updated answer for an exhaustive explanation of what is happening here. — AnT stands with Russia, Aug 31 '11 at 04:08
This is a consequence of instrumenting the low level code (with branches inserted according to the language semantics) rather that instrumenting the source code directly. GCov does it this way because it is convenient for GCov, not because it is helpful to you; there is no value in your knowing about test-coverage of compiler-generated branches supporting a presumably well-tested compiler. If you get a test coverage tool that instruments source, you wont get this kind of bogus coverage data. (Check my bio for one option). — Ira Baxter, Aug 04 '14 at 18:57

AnT stands with Russia · Accepted Answer · 2014-08-04T18:33:45.230

In a typical implementation the destructor usually has two branches: one for non-dynamic object destruction, another for dynamic object destruction. The selection of a specific branch is performed through a hidden boolean parameter passed to the destructor by the caller. It is usually passed through a register as either 0 or 1.

I would guess that, since in your case the destruction is for a non-dynamic object, the dynamic branch is not taken. Try adding a new-ed and then delete-ed object of class Foo and the second branch should become taken as well.

The reason this branching is necessary is rooted in the specification of C++ language. When some class defines its own operator delete, the selection of a specific operator delete to call is done as if it was looked up from inside the class destructor. The end result of that is that for classes with virtual destructor operator delete behaves as if it were a virtual function (despite formally being a static member of the class).

Many compilers implement this behavior literally: the proper operator delete is called directly from inside the destructor implementation. Of course, operator delete should only be called when destroying dynamically allocated objects (not for local or static objects). To achieve this, the call to operator delete is placed into a branch controlled by the hidden parameter mentioned above.

In your example things look pretty trivial. I'd expect the optimizer to remove all unnecessary branching. However, it appears that somehow it managed to survive optimization.

Here's a little bit of additional research. Consider this code

#include <stdio.h>

struct A {
  void operator delete(void *) { scanf("11"); }
  virtual ~A() { printf("22"); }
};

struct B : A {
  void operator delete(void *) { scanf("33"); }
  virtual ~B() { printf("44"); }
};

int main() {
  A *a = new B;
  delete a;
}

This is how the code for the destructor of A will look like when compiler with GCC 4.3.4 under default optimization settings

__ZN1AD2Ev:                      ; destructor A::~A  
LFB8:
        pushl   %ebp
LCFI8:
        movl    %esp, %ebp
LCFI9:
        subl    $8, %esp
LCFI10:
        movl    8(%ebp), %eax
        movl    $__ZTV1A+8, (%eax)
        movl    $LC1, (%esp)     ; LC1 is "22"
        call    _printf
        movl    $0, %eax         ; <------ Note this
        testb   %al, %al         ; <------ 
        je      L10              ; <------ 
        movl    8(%ebp), %eax    ; <------ 
        movl    %eax, (%esp)     ; <------ 
        call    __ZN1AdlEPv      ; <------ calling `A::operator delete`
L10:
        leave
        ret

(The destructor of B is a bit more complicated, which is why I use A here as an example. But as far as the branching in question is concerned, destructor of B does it in the same way).

However, right after this destructor the generated code contains another version of the destructor for the very same class A, which looks exactly the same, except the movl $0, %eax instruction is replaced with movl $1, %eax instruction.

__ZN1AD0Ev:                      ; another destructor A::~A       
LFB10:
        pushl   %ebp
LCFI13:
        movl    %esp, %ebp
LCFI14:
        subl    $8, %esp
LCFI15:
        movl    8(%ebp), %eax
        movl    $__ZTV1A+8, (%eax)
        movl    $LC1, (%esp)     ; LC1 is "22"
        call    _printf
        movl    $1, %eax         ; <------ See the difference?
        testb   %al, %al         ; <------
        je      L14              ; <------
        movl    8(%ebp), %eax    ; <------
        movl    %eax, (%esp)     ; <------
        call    __ZN1AdlEPv      ; <------ calling `A::operator delete`
L14:
        leave
        ret

Note the code blocks I labeled with arrows. This is exactly what I was talking about. Register al serves as that hidden parameter. This "pseudo-branch" is supposed to either invoke or skip the call to operator delete in accordance with the value of al. However, in the first version of the destructor this parameter is hardcoded into the body as always 0, while in the second one it is hardcoded as always 1.

Class B also has two versions of the destructor generated for it. So we end up with 4 distinctive destructors in the compiled program: two destructors for each class.

I can guess that at the beginning the compiler internally thought in terms of a single "parameterized" destructor (which works exactly as I described above the break). And then it decided to split the parameterized destructor into two independent non-parameterized versions: one for the hardcoded parameter value of 0 (non-dynamic destructor) and another for the hardcoded parameter value of 1 (dynamic destructor). In non-optimized mode it does that literally, by assigning the actual parameter value inside the body of the function and leaving all the branching totally intact. This is acceptable in non-optimized code, I guess. And that's exactly what you are dealing with.

In other words, the answer to your question is: It is impossible to make the compiler to take all the branches in this case. There's no way to achieve 100% coverage. Some of these branches are "dead". It is just that the approach to generating non-optimized code is rather "lazy" and "loose" in this version of GCC.

There might be a way to prevent the split in non-optimized mode, I think. I just haven't found it yet. Or, quite possibly, it can't be done. Older versions of GCC used true parameterized destructors. Maybe in this version of GCC they decided to switch to two-destructor approach and while doing it they "reused" the existing code-generator in such a quick-and-dirty way, expecting the optimizer to clean out the useless branches.

When you are compiling with optimization enabled GCC will not allow itself such luxuries as useless branching in the final code. You should probably try to analyze optimized code. Non-optimized GCC-generated code has lots of meaningless inaccessible branches like this one.

I tried the different optimization levels, but for this case it doesn't have any impact. — Eddy Pronk, Aug 26 '11 at 04:45
adding a `new`-ed and then `delete`-ed object of class `Foo` makes it touch both dtor symbols, but doesn't affect the branch. — Eddy Pronk, Aug 26 '11 at 04:56
@Eddy: do not forget that if the `new` occurs in the same scope as the `delete`, then the compiler may be smart enough to deduce the true dynamic type of the object and devirtualize the call to the destructor. — Matthieu M., Aug 26 '11 at 07:06
@Eddy Pronk: OK, that probably means that instead of "branched destructor" approach GCC uses "two non-branched destructors" approach. In that case I don't know what that branching is doing there. Could it be just some kind of placeholder for inserting something in the future? Or maybe just something added to improve pipelining/alignment/branch prediction? — AnT stands with Russia, Aug 26 '11 at 15:28
I don't see much coming from this line of reasoning since operator new / operator delete are resolved at compile time, not runtime. I'm willing to be proven wrong (or just ignore this and keep upvoting...), but I don't see an example of operator new actually impacting this branch. — Adam Mitz, Aug 30 '11 at 01:25
@Adam Mitz: In-class `operator delete` calls are resolved at *run time* in general case. The language specification requires the `operator delete` to be selected based on the *dynamic* type of the object being deleted. Obviously, this can't be done at compile time. In other words, member `operator delete` behaves as *virtual* function, even though it is declared as a static member. This weird property of `operator delete` is expressed in the standard by that specific wording that says that `operator delete` should be looked up from "the definition of the dynamic type’s virtual destructor". — AnT stands with Russia, Aug 30 '11 at 05:37
But at this point the destructor of the dynamic type has already been selected, and we're in it. — Adam Mitz, Aug 30 '11 at 12:13
@Adam Mitz: Er... yes. As I said, the popular approach to implementing this behavior is to call `operator delete` from the destructor. That way "virtuality" of `operator delete` simply piggy-backs on the virtuality of the destructor. But you need to know when to call `operator delete` and when not to call `operator delete`. Obviously, you are not supposed to call it for automatic objects. This is what is often implemented through a hidden destructor parameter and branching, as I described above. — AnT stands with Russia, Aug 30 '11 at 14:58
OK, I'm going to give up trying to argue any fact of what you're describing an imaginary implementation might do, I'm just noting the lack of any evidence connecting that to what GCC is actually going. — Adam Mitz, Aug 31 '11 at 01:17
@Adam Mitz: There's nothing "imaginary" about it. This is exactly how the code generated by MSVC++ compiler works. And, I believe, I saw some version of GCC do the same thing. Which is why I made this guess. — AnT stands with Russia, Aug 31 '11 at 02:40
@Adam Mitz: ... and we've got confirmation. What I described above and in the comments is *exactly* what happens in GCC 4. The same original technique gets "pseudo-optimized" into two almost identical versions of the destructor with [now unnecessary] branching remaining in the non-optimized code. — AnT stands with Russia, Aug 31 '11 at 03:52
So we end up where we started, it's a conditial branch instruction that's functionally unconditoinal. — Adam Mitz, Aug 31 '11 at 04:06
@Adam Mitz: Well, one of the questions was "What causes this branch?" — AnT stands with Russia, Aug 31 '11 at 04:57
It is interesting to see that you mentioned optimizations many times. The one thing you are expected to do to run a good coverage test is to use -O0. I know the compiler and assembler still do optimizations, but not as much as otherwise possible. — Alexis Wilke, Apr 14 '13 at 10:35

score 7 · Answer 2 · answered Aug 26 '11 at 03:15

7

In the destructor, GCC generated a conditial jump for a condition which can never be true (%al is not zero, since it was just assigned a 1):

[...]
  29:   b8 01 00 00 00          mov    $0x1,%eax
  2e:   84 c0                   test   %al,%al
  30:   74 30                   je     62 <_ZN3FooD0Ev+0x62>
[...]

answered Aug 26 '11 at 03:15

Adam Mitz

6,025
1
29
28

any idea why it's not optimized away ? It seems an inconditional jump would be better. – Matthieu M. Aug 26 '11 at 07:04
In that particular case I didn't give GCC any -O options, but even with -O there are some "interesting" control flow patterns (like calling an address which is in the middle of the calling function). I guess you could also make a case that without -O it shouldn't generate such code -- but maybe they have their reasons? – Adam Mitz Aug 26 '11 at 12:16
I don't know, assembler is still a mythologic beast to me as I never really dug in :) – Matthieu M. Aug 26 '11 at 12:22
1

Scratch that last bit about calling an address in the middle of the calling function: I was looking at the wrong disassembly output (before linking and resolving relocations). – Adam Mitz Aug 30 '11 at 01:27
2

Gosh! What the hell is being done here `je 62 <_ZN3FooD0Ev+0x62>`? Adding an offset to the function base address? :-/ – Nawaz Sep 01 '11 at 06:20
Yes, relative jump targets are denoted by their offset from the start of the function. – Adam Mitz Sep 03 '11 at 22:26

score 0 · Answer 3 · answered Jan 10 '18 at 17:20

Destructor problem still there for gcc version 5.4.0, but seems not to exist for Clang.

Tested with:

clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

Then use "llvm-cov gcov ..." to generate coverage as described here.

What is the branch in the destructor reported by gcov?

3 Answers3

Linked