5

I need some suggestions for debugging a crash in a Delphi XE2 application. I've never seen the crash myself - indeed it occurs very rarely and is not reproducible on demand.

We do though have a set of 10 crash reports from MadExcept. These show that the main thread was processing a WM_PAINT message at the time in the list view on the main form. The call stack in each case shows no references to my own code, just VCL code and functions in comctl32.dll, ntdll.dll and USER32.dll.

The list view in question is TColorListView, which derives from TCustomListView, and handles the OnCustomDrawItem and OnDeletion events. But as I said, none of my TColorListView code is on the call stack when the crash occurs.

The actual location of the crash in each case varies, but the sequence of calls (earlier to later) leading up to it is always:

KiUserCallbackDispatcher
RtlAnsiStringToUnicodeString
StdWndProc
TWinControl.MainWndProc
TCustomListView.WndProc
TWinControl.WndProc
TControl.WndProc
TCustomListView.WMPaint
TWinControl.WMPaint
TWinControl.WMPaint
TWinControl.DefaultHandler
CallWindowProcA
TControl.WndProc

After that it goes into one of StdWndProc/SendMessageW/TControl.Perform, and from there the path is different each time. Eventually it ends up in one of comctl32.dll, USER32.dll, GDI32.dll or just TControl.WndProc and raises an EAccessViolation. Sadly I have no information about what the user was trying to do at the time because the user didn't fill in that part of the bug report.

Can you suggest any 'psychic debugging' techniques that I can use to try to pin down the cause of this crash (and thus fix it)?


Update to answer the questions in the comments below:

procedure TColorListView.HandleCustomDrawItem(aSender: TCustomListView; aItem: TListItem;
                                              aState: TCustomDrawState; var aDefaultDraw: Boolean);
begin
  Canvas.Font.Color := ItemColors[aItem.Index];
end;

In (just) one of the crash reports, it appears to go off into TListItem.GetIndex and crashes several stack frames further on. That's probably a red herring though.

What's the 'Perform'ed message? Sorry, I don't know. MadExcept doesn't give me method argument values; just the method names.


31 May

Although I'd prefer to find the fault just from the information I have, I'd also welcome suggestions for any new diagnostics I could add to the program so that if this crash occurs again after the next release I will have more to go on. I'm at a loss though because at the point of the crash none of the code I can modify is even on the call stack.


13 June

I've added to the MadExcept report a line that tells me what state the application was in when the exception occurred - Starting/Active/Idle/ModalDlg/Terminated. (Thanks to Chris Thornton for his comment suggesting this.) I think there is reasonable chance that the exception is happening during shutdown. Unfortunately it won't be until 2014 before we release the new version and have the possibility of getting back bug reports with the new diagnostics.

Ian Goldby
  • 5,609
  • 1
  • 45
  • 81
  • What happens in `OnCustomDrawItem`? – J... May 30 '13 at 11:46
  • What's the 'Perform'ed message? – Sertac Akyuz May 30 '13 at 11:58
  • See answers added above. – Ian Goldby May 30 '13 at 12:18
  • Is this happening when your app is up and running, waiting for input? Or is it happening during the "edge cases" like shutdown or resume from standby? – Chris Thornton May 30 '13 at 18:18
  • @Chris I wish I knew. The only information I have is the MadExcept report. It had crossed my mind that it might be when the app is shut down, but that's no more than a guess. – Ian Goldby May 31 '13 at 07:22
  • Are you using multithreading? It might be due to a shared object updated in a worker thread and the mainthread trying to access the same property at the same time (when redrawing a canvas, component...). – Greg M. Jun 03 '13 at 20:23
  • @Greg: Yes, there are multiple threads. I can't rule this out as a factor (because, obviously, there is a mistake _somewhere_ in the code), but the design is that the only communication between threads is by Windows messages, and there are (supposed to be) no objects shared between threads. Can you suggest a generic way I might add some diagnostics to find out if any object is being accessed by more than one thread? – Ian Goldby Jun 04 '13 at 07:45
  • @Ian : Yes, using Windows Message is a good way to synchronize. Sorry but by object I mean a value, a simple String with concurrent R/W access might lead to this kind of problem. You should first be sure that you don't share such data between threads, and then eventually implements synchronization methods when reading/writing (using TCriticalSection for example). – Greg M. Jun 04 '13 at 10:16
  • It's worth ruling out memory access errors by checking aItem.Index is in bounds, Canvas is not nil, etc. If you write to an invalid address it may work but cause random problems later. – David Jun 04 '13 at 11:20
  • @David Don't get hung up on HandleCustomDrawItem. Most of the crashes don't involve this method at all. Yes, any write to an invalid address could cause random problems later, but that's not confined to HandleCustomDrawItem - it could just as easily be anywhere else in my code. – Ian Goldby Jun 04 '13 at 13:05
  • @Greg What I meant was that there are no objects shared between threads at all (unless there's a mistake in the code), not even strings. LPARAM and WPARAM in the messages don't count because they are passed by value, not reference. – Ian Goldby Jun 04 '13 at 13:08
  • @IanGoldby : I had almost the same problem few weeks ago focused on a statuspanel painting and crashing due to a synchronization problem (and i was also using Messages to update the gui, but it was not enough). The app could work more than 12 hours or no more than 1 min and then suddenly crash (with callstack completely different)... I'm sorry, i have no more idea ^^. – Greg M. Jun 04 '13 at 13:33
  • @ChrisThornton You made an implicit suggestion to find out whether this is happening during an 'edge case'. I've added some diagnostics so that when/if it happens again this will be included in the MadExcept report. If you submit this as an answer then I'll award the bounty, assuming a better answer doesn't come before the bounty expires. – Ian Goldby Jun 06 '13 at 07:23

3 Answers3

2

This is just a guess but maybe you are facing same problem that I have(look similar).
My problem was in destroying WinAPI windows in different thread than they was created.
Windows will not destroy window in this case and return error, but some Delphi components just ignore that error, so you end up with hanging window that have WndProc pointed to junk memory(it will be freed by Delphi on component destruction, but window will stay behind).
And when this window will try to process any message it will go to WndProc(which is undefined) and result in random callstack with AV.

So make sure you are creating and deleting windows in same thread (pay special attention to TTimer, they also create windows)

VitaliyG
  • 1,837
  • 1
  • 11
  • 11
1

Interesting read here. Maybe something similar is happening? Wouldn't hurt to check for canvas <> nil

Access violation while the program was idle - not trace information to track down the bug

Community
  • 1
  • 1
Chris Thornton
  • 15,620
  • 5
  • 37
  • 62
  • HandleCustomDrawItem is the only place I make any use of Canvas, and that isn't where it is crashing. Only one of the crash stack dumps mentions TColorListView at all. – Ian Goldby May 31 '13 at 07:28
1

First of all, to deBUG access violation errors you have to find variables (memory pointers) that are referencing areas not owned memory by your process.

Mostly non-initialized variables causes the problem.

So my suggestion would be to change the piece the following way

procedure TColorListView.HandleCustomDrawItem(aSender: TCustomListView; aItem: TListItem;
                                          aState: TCustomDrawState; var aDefaultDraw:    boolean);
begin
   if Canvas = nil then 
       .... ; // a breakpoint here
   if ItemColors = nil then 
       .... ; // a breakpoint here
   if aItem = nil then 
       .... ; // a breakpoint here
   Canvas.Font.Color := ItemColors[aItem.Index];
end;

I hope that this will show you which variable is not passed as expected.

My guess is for aItem.

Ali Avcı
  • 870
  • 5
  • 8
  • See the comment I gave to Chris Thornton's answer. It's not this method where the access violation is occurring, and in any case the bug isn't reproducible so setting breakpoints won't help. – Ian Goldby Jun 06 '13 at 15:15