I need some suggestions for debugging a crash in a Delphi XE2 application. I've never seen the crash myself - indeed it occurs very rarely and is not reproducible on demand.
We do though have a set of 10 crash reports from MadExcept. These show that the main thread was processing a WM_PAINT message at the time in the list view on the main form. The call stack in each case shows no references to my own code, just VCL code and functions in comctl32.dll, ntdll.dll and USER32.dll.
The list view in question is TColorListView, which derives from TCustomListView, and handles the OnCustomDrawItem and OnDeletion events. But as I said, none of my TColorListView code is on the call stack when the crash occurs.
The actual location of the crash in each case varies, but the sequence of calls (earlier to later) leading up to it is always:
KiUserCallbackDispatcher
RtlAnsiStringToUnicodeString
StdWndProc
TWinControl.MainWndProc
TCustomListView.WndProc
TWinControl.WndProc
TControl.WndProc
TCustomListView.WMPaint
TWinControl.WMPaint
TWinControl.WMPaint
TWinControl.DefaultHandler
CallWindowProcA
TControl.WndProc
After that it goes into one of StdWndProc/SendMessageW/TControl.Perform, and from there the path is different each time. Eventually it ends up in one of comctl32.dll, USER32.dll, GDI32.dll or just TControl.WndProc and raises an EAccessViolation. Sadly I have no information about what the user was trying to do at the time because the user didn't fill in that part of the bug report.
Can you suggest any 'psychic debugging' techniques that I can use to try to pin down the cause of this crash (and thus fix it)?
Update to answer the questions in the comments below:
procedure TColorListView.HandleCustomDrawItem(aSender: TCustomListView; aItem: TListItem;
aState: TCustomDrawState; var aDefaultDraw: Boolean);
begin
Canvas.Font.Color := ItemColors[aItem.Index];
end;
In (just) one of the crash reports, it appears to go off into TListItem.GetIndex and crashes several stack frames further on. That's probably a red herring though.
What's the 'Perform'ed message? Sorry, I don't know. MadExcept doesn't give me method argument values; just the method names.
31 May
Although I'd prefer to find the fault just from the information I have, I'd also welcome suggestions for any new diagnostics I could add to the program so that if this crash occurs again after the next release I will have more to go on. I'm at a loss though because at the point of the crash none of the code I can modify is even on the call stack.
13 June
I've added to the MadExcept report a line that tells me what state the application was in when the exception occurred - Starting/Active/Idle/ModalDlg/Terminated. (Thanks to Chris Thornton for his comment suggesting this.) I think there is reasonable chance that the exception is happening during shutdown. Unfortunately it won't be until 2014 before we release the new version and have the possibility of getting back bug reports with the new diagnostics.