6

I am working with a large and rather mature C++ project (10+ years of development, 150k+ SLOC, 3k+ test cases, www.sumo-sim.org). We recently discovered that the program behavior changes in an unexpected manner when putting in one seemingly innocent print statement (std::cout << "foo\n";) in a specific location. The objdump output also shows large changes in the generated code depending on the presence of that print statement.

Currently our best guess is that this is related to undefined behavior and compiler optimization (as discussed in an post by John Regehr). This assumption is supported by our observation that the effect of the print statement is subject to optimization level. Since the application runs in single thread concurrency should not be an issue.

To debug undefined behavior we have used clang with flags -fsanitize=undefined,unsigned-interger-overflow,address,integer and gotten rid of all the indicated problems. We have also fixed all problems indicated by the clang static analyzer but the problem remains (curiously with gcc, clang and msvc but with slightly different results).

Now we are at a loss of ideas on how to best continue with our debugging efforts. Due to the non-locality of the effects of the print statement we don't even know where to start with a code review.

Question 1: What tools would you recommend for doing static and runtime analysis of potential problem spots (similar to the clang tools described above)?

Question 2: What mechanisms other than the combination of undefined behavior + compiler optimization are likely candidates for the observed effect (non-functional statements changing program behavior)?

Jakob Erdmann
  • 190
  • 1
  • 7
  • 2
    What happens when you don't print? Do all 3 versions behave the same? – cup Jun 04 '14 at 14:32
  • 2
    In our company one technique we have used is to use bit flags to inidicate what code path has been executed rather than printing statements but this is a very laborious process that requires decoding afterwards, so you set the flags and then print once. This was required due to the low latency of iocompletion ports and monitoring network packet behaviour, I would be interested to learn of alternative techniques – EdChum Jun 04 '14 at 14:35
  • What is the actual print statement? Is it really just `cout << "foo"`? What is the problem that it causes? Work backwards from the symptoms. What could cause the observed behavior? what's getting corrupted? Use a [data breakpoint](http://stackoverflow.com/q/160045/10396) to monitor it. The thing about undefined behavior is that it is not truly random: any given compiler will handle the same undefined construct the same way every time - you can learn to recognize the patterns. – AShelly Jun 04 '14 at 15:02
  • You can put a conditional breakpoint (or abort) at the point where you can detect the damage, and trace it back as AShelly suggests. Keep pushing your breakpoints/watchpoints/asserts earlier as you narrow it down. Embrace the disassembly. – Useless Jun 04 '14 at 15:24
  • Do you run several threads in your program ? If yes the optimization level and the additional print/cout can affect the thread execution order and cause that kind of issue if you have one/some data-race. As threading issues are often the most complex to to find/solve, if anything else worked, try a thread checker like the "Intel Thread Checker" or any other! – alexbuisson Jun 04 '14 at 15:36
  • If your target platform supports a file system, you should consider opening a "log" file and writing to it. On my embedded system, we store the log file in persistent memory (magnetic memory). The logs can be retrieved on the next power-up. – Thomas Matthews Jun 04 '14 at 16:14
  • I would also guess that there is a threading problem involved because printing changes the timing of thread execution. Have you tried valgrind to check for memory overwrites and hellgrind to check for thread problems? – Jens Jun 04 '14 at 18:47
  • @Useless: The application runs in a single thread. I have updated the question. – Jakob Erdmann Jun 05 '14 at 06:22
  • @cup. Even without printing the 3 versions behave differently. But they all change their original behavior when printing. – Jakob Erdmann Jun 05 '14 at 06:30
  • @Jens Thank you for suggesting Valgrind. Using Valgrind I received a valuable hint about 'uninitialized memory allocated on the stack' that helped me to track down the problem. Hence I would like to mark my question as 'answered'. – Jakob Erdmann Jun 05 '14 at 07:50
  • The problem was linked to undefined behavior caused by initialized variables. I recommend http://blog.regehr.org/archives/213 as an excellent eye-opener regarding the dangers of undefined behavior. Using clang and undefined-behavior warning options also helped – Jakob Erdmann May 04 '15 at 13:07

0 Answers0