1

I'm in a very weird situation where my code works on my desktop but crashes on a remote cluster. I've spent countless times checking my cource code for errors, running it in debugger to catch what breaks the code, and looking for memory leaks under valgrind (which turned out to be clean -- at least under gcc).

Eventually what I have found out so far is that the same source code produces identical on both machines as long as I'm using the same compiler (gcc 4.4.5). Problem is I want to use intel compiler on the remote cluster for better performances and also some prebuilt libraries that use intel. Besides, I'm still worried that maybe gcc is neglecting some memory issues that are caught in intel compiler.

What does this mean for my code?

mmirzadeh
  • 6,893
  • 8
  • 36
  • 47
  • Can you show us some code? Maybe the bug is obvious to some of us. – fredoverflow Sep 25 '12 at 18:53
  • I would but unfortunately its a large project with multiple files and I have no clue where this could be coming from. Are there any code parser that could suggest possible undefined behaviors? – mmirzadeh Sep 25 '12 at 18:59
  • recently happened to me - when compiling in DEBUG my uninitialized bool was true , but was false when compiling in Release. vs2012 – jaybny Sep 26 '12 at 05:18

2 Answers2

4

It probably means you are relying on undefined, unspecified or implementation-defined behavior.

Maybe you forgot to initialize a variable, or you access an array beyond its valid bounds, or you have expressions like a[i] = b[i++] in your code... the possibilities are practically infinite.

Community
  • 1
  • 1
fredoverflow
  • 256,549
  • 94
  • 388
  • 662
  • How can I find them? Any tool you would suggest? – mmirzadeh Sep 25 '12 at 18:51
  • 1
    For additional reference, [this question](http://stackoverflow.com/questions/367633/what-are-all-the-common-undefined-behaviour-that-a-c-programmer-should-know-ab) lists a number of common sources of undefined behaviour. – chris Sep 25 '12 at 18:51
  • @GradGuy You can use Valgrind to detect the use of uninitialized values or access to invalid memory addresses. – petersohn Sep 25 '12 at 19:44
  • @petersohn: I have used valgrind for that purpose and it does not report any error whatsoever. Also, it says the code is leak-free. – mmirzadeh Sep 25 '12 at 21:03
0

Does the crash result in a core file? If back traces, equivalent to gdb 'bt' command, from multiple core dumps are consistent, then you can begin to start putting in printf statements selectively and work backwards up the list of functions in the stack trace.

If there are no memory leaks detected, then heap is probably okay. That leaves the stack as a potential problem area. It looks like you may have an uninitialized variable that is smashing the stack.

Try compiling your app with '-fstack-protector' included in your gcc/g++ compile command arguments.

Arun Taylor
  • 1,574
  • 8
  • 5