5

I've got a large Mac app that runs for a couple of days at a time operating on a large data set. It's a mix of Objective-C++ and C++. It runs great on Mountain Lion, but on Mavericks, after running for about 10 to 20 minutes (in which a couple of million objects are allocated and destroyed), it crashes. It behaves as if it's crashing with an invalid pointer (i.e. calling a function on a deleted C++ object), but the object it's pointing to is in a state that makes absolutely no sense.

All my C++ classes inherit from a common base class where the constructor looks something like this:

MyClass::MyClass()
{
  mCreated = 12345; //int member variable set here and NEVER TOUCHED AGAIN.
  //other initialization stuff
}

When it crashes, the debugger shows that in the bad object, the value for mCreated is 0. It's behaving as if the object never ran its constructor!

I don't think it's memory stomping, because this value is never anything other than 0 or its expected value, and none of the other fields in the object have values that look like the garbage you'd expect from memory stomping.

I've also tried running with scribble turned on, and the 0x555 and 0xaaa values don't show up anywhere. I've also tried Guard Edges.

In-depth investigation has not revealed anything. The bad object isn't even always the same class. All I can think of is that something with the new memory stuff in Mavericks (compressing unused memory) is causing some new behavior (maybe a bug or maybe some previously unknown, mostly-unenforced rule that now really matters).

Has anyone seen anything similar? Or does anyone know of any mostly-unknown memory rules that would apply more strongly under Mavericks?

Tom Hamming
  • 10,577
  • 11
  • 71
  • 145
  • recently learned **very useful** debug technique (so I feel better sharing it): GDB (and surely LLDB) can watch memory addresses being read from/written to. I don't remember the exact command, but you'll find it here for sure, search for "gdb watch memory access". –  Nov 20 '13 at 22:42
  • 1
    aw wait, no, [here it is](http://stackoverflow.com/questions/58851/can-i-set-a-breakpoint-on-memory-access-in-gdb/59146#59146). –  Nov 20 '13 at 22:43
  • @H2CO3 - Good to know, but there's no way to know what address it's going to choke on. – Tom Hamming Nov 20 '13 at 22:46
  • Why aren't you using an initialization list? – Eric Jablow Nov 20 '13 at 23:59
  • @Mr.Jefferson, I guess the object may have been destroyed. You can set breakpoint with [command lists](http://www.ofb.net/gnu/gdb/gdb_35.html) for the constructor and destructor, when the breakpoint in constructor hit, set an watchpoint and continue, when the breakpoint in destructor hit, del that watchpoint and continue. If not, I suggest you log all the object constructor and destructor with the `this` dumped. And check the log if the accessed object has been destroyed or not when crash happen. – ZijingWu Nov 21 '13 at 09:35

1 Answers1

4

I think you're right about the invalid pointer suspicion. It might be a pointer to a deleted object or it might be a garbage pointer. Either one would be consistent with the mCreated member being different than you expect. In the case of a deleted object, the memory could be used for something else and therefore set to some other value. In the case of a garbage pointer, you're not pointing to anything that ever was an instance of your class.

I don't know how well the Allocations instrument works for C++ objects, but you could try reproducing the crash under that. When it stops in the debugger, get the this pointer and then get the history of that address from Instruments.

If Instruments doesn't work, you can set the MallocStackLoggingNoCompact environment variable. Then, when it stops in the debugger, examine the this pointer and use the following commands to view the history of that address:

(lldb) script import lldb.macosx.heap
(lldb) malloc_info --stack-history 0x10010d680

(Use the this address instead of 0x10010d680, of course.)

Alternatively, you can use the malloc_history command from a shell to investigate the history, if doing it within LLDB is cumbersome.

Ken Thomases
  • 88,520
  • 7
  • 116
  • 154
  • I tried this option and invoked the malloc_info on the address it stops on. It gave back nothing at all. I verified I'd given the right address. I tried it on the address of another object in the stack that was in a valid state and it printed a line about malloc, then one about the full name of the class, and then "error: expression failed" followed by a bunch of stuff beginning with `typedef int kern_return_t; typedef unsigned task_t; #define MAX_FRAMES 128` etc. Looks like part of a header file or something. – Tom Hamming Nov 21 '13 at 17:10
  • Try using the "malloc_history" tool. I'm aware of but not experienced with the LLDB command. It may have limitations. The results you got, though, suggest that the `this` pointer is junk. Try going up the stack to figure out where the caller got the instance pointer it's calling member functions on. Perhaps it got it from a deallocated instance's member variable, etc. Somewhere, there's a start to the chain of junk. – Ken Thomases Nov 21 '13 at 17:13