Is there a way, in tests, to defend against "correct" results coming out of undefined behavior?

Question

Foreword

I know what UB is, so I'm not asking how to avoid it, but whether there's a way to make unit testing more resistent to it, even if it's a probabilistic approach, that just makes UB more likely to become apparent rather than silently passing tests successfully.

The question

Let's say I want to write a test for a function and I that I do it wrong, like this:

#include <gtest/gtest.h>
#include <vector>

int main()
{
    std::vector<int> v{0};
    for (auto i = 0; i != 100; ++i) {
      v.push_back(3);     // push a 3
      v.pop_back();       // ops, popping the value I just pushed
      EXPECT_EQ(v[1], 3); // UB
    }
}

On my machine, it consistently passes; maybe the program is so simple that there's no reason for the 3 to be truly wiped away from the area of memory where it lives before pop_back.

Therefore the test clearly isn't reliable.

Is there any way to protect against such accidentally succesful tests, even on a statistical ground ("calling shuffleFreedMemory() before the EXPECT_EQ you decrease the chances that UB will sting you")?

The code above is just an example (I'm not willing to test the STL); I know of std::vector<T>::at as a bound-safe std::vector<T>::operator[], but that's a way to prevent undefined behavior in the first place, whereas I'm wandering about how to defend against it.

For instance, leveraging UB itself by adding *(&v[0] + 1) = 10; right after v.pop_back();, will make the incorrectness of the test apparent, at least on my machine.

So I'm kind of thinking of a tool/library/whatever which would, let's say, set the memory not hold by v to random values after every executable line.

No, undefined behavior may exactly match your (unfounded) expectation :) — 500 - Internal Server Error, Aug 10 '21 at 14:51
Yep. Use `at` if you want to range check your acess. If you don't want to throw an exception if out of range, then you need to do that range check yourself. — NathanOliver, Aug 10 '21 at 14:52
You can't really fully unit test against UB. They might help you catch some instances of it, but it can't prove correctness. But this is true of testing in general. The goal is to reduce the chances of a defect making it through undetected. — François Andrieux, Aug 10 '21 at 14:53
The nature of **undefined behavior** is such that—if you are unlucky—it may appear to work as you (or your unit tests) expect. If you are lucky, it'll crash. But it could also email your browser history to your grandmother then format your hard drive. — Eljay, Aug 10 '21 at 14:55
We could spend all day looking at specific cases, but in general, no. This is why proper testing is so important (and so difficult). It's also why it's important to test individual sections and components of your code, and not just as a whole. — Jacob FW, Aug 10 '21 at 15:03
You could add an extra `EXPECT_GE(v.size(), 2);` before the other test. That will fails while it should pass if the next test is valid (no UB). Using `at` in tests or a checked library might also help reduce incorrect tests. — Phil1970, Aug 10 '21 at 15:10
As I've specified in the question, the example is just an example, so let's not focus on `std::vector`'s API. — Enlico, Aug 10 '21 at 15:16
Use _UB Sanitizer_ as well as compiling and running the unit test programs. — JDługosz, Aug 10 '21 at 16:19

score 11 · Answer 1 · answered Aug 10 '21 at 14:56

Clang with Adress Sanitizer (https://clang.llvm.org/docs/AddressSanitizer.html) catches this error:

$ clang++ -Wall -std=c++11 -o test test.cpp
$ ./test # program runs without errors

$ clang++ -fsanitize=address -Wall -std=c++11 -o test test.cpp
$ ./test
=================================================================
==94146==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000f4 at pc 0x00010ebcbf54 bp 0x7ffee10362d0 sp 0x7ffee10362c8
READ of size 4 at 0x6020000000f4 thread T0
    #0 0x10ebcbf53 in main+0x393 (test:x86_64+0x100002f53)
    #1 0x7fff204c3f3c in start+0x0 (libdyld.dylib:x86_64+0x15f3c)

0x6020000000f4 is located 4 bytes inside of 8-byte region [0x6020000000f0,0x6020000000f8)
allocated by thread T0 here:
    #0 0x10ec38c9d in wrap__Znwm+0x7d (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x54c9d)
    #1 0x10ebcdb38 in std::__1::__libcpp_allocate(unsigned long, unsigned long)+0x18 (test:x86_64+0x100004b38)
    #2 0x10ebcdaa9 in std::__1::allocator<int>::allocate(unsigned long)+0x49 (test:x86_64+0x100004aa9)
    #3 0x10ebcd4cc in std::__1::allocator_traits<std::__1::allocator<int> >::allocate(std::__1::allocator<int>&, unsigned long)+0x1c (test:x86_64+0x1000044cc)
    #4 0x10ebcfbc0 in std::__1::__split_buffer<int, std::__1::allocator<int>&>::__split_buffer(unsigned long, unsigned long, std::__1::allocator<int>&)+0x180 (test:x86_64+0x100006bc0)
    #5 0x10ebcf68c in std::__1::__split_buffer<int, std::__1::allocator<int>&>::__split_buffer(unsigned long, unsigned long, std::__1::allocator<int>&)+0x2c (test:x86_64+0x10000668c)
    #6 0x10ebceec4 in void std::__1::vector<int, std::__1::allocator<int> >::__push_back_slow_path<int>(int&&)+0x154 (test:x86_64+0x100005ec4)
    #7 0x10ebcc480 in std::__1::vector<int, std::__1::allocator<int> >::push_back(int&&)+0xd0 (test:x86_64+0x100003480)
    #8 0x10ebcbedd in main+0x31d (test:x86_64+0x100002edd)
    #9 0x7fff204c3f3c in start+0x0 (libdyld.dylib:x86_64+0x15f3c)

SUMMARY: AddressSanitizer: heap-buffer-overflow (test:x86_64+0x100002f53) in main+0x393
Shadow bytes around the buggy address:
  0x1c03ffffffc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c03ffffffd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c03ffffffe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c03fffffff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c0400000000: fa fa fd fd fa fa 00 00 fa fa 00 06 fa fa 00 fa
=>0x1c0400000010: fa fa 00 00 fa fa 00 06 fa fa fd fa fa fa[04]fa
  0x1c0400000020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c0400000030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c0400000040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c0400000050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c0400000060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==94146==ABORTING
[1]    94146 abort      ./test
    /tmp 

Considering the scope of the question, It's worth pointing out that clang also has a more general-purpose undefined behavior sanitizer as well: https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html — , Aug 10 '21 at 15:47
Also, address sanitizer does not actually catch this error on my system unless I bump up the index. This is presumably because the dereferenced memory is still part of the vector's capacity because my stdib does not perform an implicit `shrink_to_fit()` after a pop_back(). — , Aug 10 '21 at 15:52
Oh I didn't even know about this one. Thanks @Frank! On my system however, `-fsanitize=undefined` does not catch this particular error. I'm on macOS with clang 12.0.5. — Martin Fink, Aug 10 '21 at 15:54

score 1 · Answer 2 · 2021-08-11T05:08:13.860

Checking for invalid memory accesses is unfortunately not good enough as pop_back() is not required to relinquish the memory.

v[1] is always undefined behavior by virtue of reading from a deleted object, but this is a subtlety that only exists during compilation from the perspective of the c++ abstract machine. Once the code has been compiled to binary, as long as the memory is allocated and properly aligned, then there is no "problem". Because of this, you will not necessarily catch such UB with system-level runtime checks.

While this is not a silver bullet for UB in general, there are some preprocessor macros you can define to enable additional validation within the standard library.

stdlib	macro
libstdc++	_GLIBCXX_DEBUG
libc++	_LIBCPP_DEBUG
MSVC	automatic for Debug builds, but partial :(

So adding -D_GLIBCXX_DEBUG -D_LIBCPP_DEBUG to the compiler flags will reliably catch OP's error, at least when using gcc/clang.

AFAIK, MSVC has debug access guards for the STL in debug builds by default. — Jan Hošek, Aug 11 '21 at 05:01
@JanHošek It has them for iterators, but not for stuff like `std::vector<>::operator[]`, which is specifically the one OP cares about. (I still added that detail to the answer, thanks for pointing it out) — , Aug 11 '21 at 05:05

score 0 · Answer 3 · answered Aug 11 '21 at 19:42

You can make more of your test suite by combining it with a variety of other approaches, just by compiling and running the test code with different compilation options. For the specific example that you have shown there are the address sanitizers that are supported by clang and gcc. But, there are quite some more sanitizers that detect other kinds of issues during runtime. (The valgrind tool suite may also be useful.)

Not all of the sanitizers can be combined, and thus you will likely have to compile and run your code several times with different settings. This, however, is advisable also because there are even more ways how you can compile your code to find further bugs:

With different optimization levels: With higher optimization levels the compilers analyze the code more deeply and perform transformations in ways where code parts with undefined behaviour may be eliminated or changed such that this becomes observable by tests.
With and without assertions enabled - both scenarios are relevant: With assertions enabled you may find additional issues, with assertions disabled you may find issues due to side-effects in assertion expressions.
With special debugging flags for used libraries (like, the C++ STL, where the libraries can determine whether some iterator is used after it got invalidated)

All of the above benefit from being run with a nicely designed test suite, which has good coverage of the code and interesting scenarios (like, boundary cases), because all of these approaches depend on being actually executed on the problematic code pieces and often also on the data used during execution.

Certainly, to be mentioned for completeness, these dynamic approaches should be combined with other quality assurance techniques like reviews, static code analysis tools etc.

Is there a way, in tests, to defend against "correct" results coming out of undefined behavior?

Foreword

The question

3 Answers3