Irreproducible runtime errors - general approach?

Question

I'm facing a problem that is so mysterious, that I don't even know how to formulate this question... I cannot even post any piece of code.

I develop a big project on my own, started from scratch. It's nearly release time, but I can't get rid of some annoying error. My program writes an output file from time to time and during that I get either:

std::string out_of_range error
std::string length_error
just lots of nonsense on output

Worth noting that those errors appear very rarely and can never be reproduced, even with the same input. Memcheck shows no memory violation, even on runs where errors were previously noted. Cppcheck has no complains as well. I use STL and pthreads intensively, but without the latter one errors also happen.

I tried both newest g++ and icpc. I am running on some version of Ubuntu, but I don't believe that's the reason.

I would appreciate any help from you, guys, on how to tackle such problems. Thanks in advance.

I sometimes come across string exceptions, and most often, this is because I've passed in the parameters to string methods the wrong way round - for example, `string::append`, to append a single character for example, the order is count and then char, often I've typed it the other way round. This is really hard to debug until some wierdness happens... — Nim, Mar 01 '11 at 11:08
How rare is "very rarely"? If you run it for two or three hours, do you think the bug will come up? — Pedro d'Aquino, Mar 01 '11 at 12:16
@Pedro: I test the program on multiple computers, on various data, nearly all working days (and nights). It crashes ~once per few days. If you mind hiding the bug - it would certainly be a way out for somebody with lower moral standards ;) — Julian, Mar 01 '11 at 12:32
No, I was just thinking how feasible it would be to test it under different configurations and wait for the bug to pop up. Not very much, it seems. — Pedro d'Aquino, Mar 01 '11 at 13:30

score 2 · Answer 1 · answered Mar 01 '11 at 11:06

2

Enable coredumps (ulimit -c or setrlimit()), get a core and start gdb'ing. Or, if you can, make a setup where you always run under gdb, so that when the error eventually happen you have some information available.

answered Mar 01 '11 at 11:06

Erik

88,732
13
198
189

+1. if I may add: catch any exception (other than those you expect) in main and generate a core dump yourself (see http://stackoverflow.com/questions/979141/how-to-programatically-cause-a-core-dump-in-c-c) – davka Mar 01 '11 at 11:55
@Matthieu: correct, my mistake. Interesting, when I don't catch the exception and the core is dumped due to an "unhandled exception", the backtrace shows the stack. So perhaps I should change the recommendation to **"don't catch any exception but only those you expect"**, i.e. comment out the `catch(...)` if you have it – davka Mar 01 '11 at 13:29
@davka: sounds good yes, though it might require extensive rework... at least `catch` is greppable ^^ – Matthieu M. Mar 01 '11 at 13:36

score 2 · Answer 2 · answered Mar 01 '11 at 11:30

The symptoms hint at a memory corruption.

If I had to guess, I'd say that something is corrupting the internal state of the std::string object that you're writing out. Does the string object live on the stack? Have you eliminated stack smashing as a possible cause (that wouldn't be detectable by valgrind)?

I would also suggest running your executable under a debugger, set up in such a way that it would trigger a breakpoint whenever the problem happens. This would allow you to examine the state of your process at that point, which might be helpful in figuring out what's going on.

score 1 · Answer 3 · answered Mar 01 '11 at 11:07

1

gdb and valgrind are very useful tools for debugging errors like this. valgrind is especially powerful for identifying memory access problems and memory leaks.

answered Mar 01 '11 at 11:07

Delan Azabani

79,602
28
170
210

score 1 · Answer 4 · answered Mar 01 '11 at 11:14

1

I encountered strange optimization bugs in gcc (like a ++i being assembled to i++ in rare circumstances). You could try declaring some critical variables volatile but if valgrind doesn't find anything, chances are low. And of course it's like shooting in the dark...

If you can at least detect that something is wrong in a certain run from inside the program, like detecting nonsensical output, you could then call an empty "gotNonsense()" function that you can break into with gdb.

answered Mar 01 '11 at 11:14

Tilman Vogel

9,337
4
33
32

1

Yeah, sometimes gcc optimization would behave the weirdest and most irrational way possible. You may try to disable optimization (-O0) and see if the problem persists. – Septagram Mar 01 '11 at 11:18

score 1 · Answer 5 · answered Mar 01 '11 at 11:18

1

If you cannot determine where exactly in the code does your program crash, one way to find that place would be using a debug output. Debug output is good way of debugging bugs that cannot be reproduced, because you will get more information about the bug the next time it happens, without the need to actively reproduce it. I recommend using some logging lib for that, boost provides one, for example.

answered Mar 01 '11 at 11:18

Septagram

9,425
13
50
81

2

+1, extensive logging will help pinpoint the point of failure. It may slow down the program, but I see it as a necessary evil when you're stuck. – Matthieu M. Mar 01 '11 at 13:08
If done right, you can disable logging in your release builds and keep it for debug only. For example, if you define QT_NO_DEBUG_OUTPUT for your Qt application, all logging done with qDebug() will be just cut out. So, no performance penalties for the end users. Of course, do take note that unnecessary debug output clutters the code, as well as logs (when you're already looking for something else), so you still need to cleanup from time to time. – Septagram Mar 03 '11 at 08:31

score 1 · Answer 6 · answered Mar 01 '11 at 14:33

You are using STL intensively, so you can try to run your program with libstdc++ in debug mode. It will do extra checks on iterators, containers and algorithms. To use the libstdc++ debug mode, compile your application with the compiler flag -D_GLIBCXX_DEBUG

Irreproducible runtime errors - general approach?

6 Answers6