1

I've recently converted a highly threaded, unmanaged Win32 C++ console application (MediaServer.exe) to an unmanaged Win32 DLL (MediaServer.dll). I'm hosting and debugging this DLL in a separate unmanaged Win32 console app, and everything compiles and runs, but after a minute or so, I'll get a random crash, in a place that makes no sense, with an apparently corrupt call stack. These crashes happen in a variety of different places, and at somewhat random times: but the commonality is that the (apparently corrupt) call stack always has various libxml2.dll functions somewhere on it, e.g., the crash might be on a line that looks like this:

xmlDoc * document = xmlReadMemory(message.c_str(), message.length(), "noname.xml", NULL, 0);

Or like this:

xmlBufferPtr buffer = xmlBufferCreate();

And the call stack might look like this:

feeefeee()  
libxml2.dll!000eeec9()  
[Frames below may be incorrect and/or missing, no symbols loaded for libxml2.dll]   
libxml2.dll!00131714()  
libxml2.dll!001466b6()  
libxml2.dll!00146bf9()  
libxml2.dll!00146c3c()  
libxml2.dll!0018419e()  

Or if you're lucky, like this:

ntdll.dll!_RtlpWaitOnCriticalSection@8()  + 0x99 bytes  
ntdll.dll!_RtlEnterCriticalSection@4()  - 0x15658 bytes 
libxml2.dll!1004dc6d()  
[Frames below may be incorrect and/or missing, no symbols loaded for libxml2.dll]   
libxml2.dll!10012034()  
libxml2.dll!1004b7f7()  
libxml2.dll!1003904c()  
libxml2.dll!100393a9()  
libxml2.dll!10024621()  
libxml2.dll!10036e8f()  
MediaServer.dll!Controller::parse(std::basic_string<char,std::char_traits<char>,std::allocator<char> > message)  Line 145 + 0x20 bytes  C++
MediaServer.dll!Controller::receiveCommands()  Line 90 + 0x25 bytes C++
MediaServer.dll!MediaServer::processCommands()  Line 88 + 0xb bytes C++
MediaServer.dll!MediaServer::processCommandsFunction(void * mediaServerInstance)  Line 450 + 0x8 bytes  C++
MediaServer.dll!CustomThread::callThreadFunction()  Line 79 + 0x11 bytes    C++
MediaServer.dll!threadFunctionCallback(void * threadInstance)  Line 10 + 0x8 bytes  C++
kernel32.dll!@BaseThreadInitThunk@12()  + 0x12 bytes    
ntdll.dll!___RtlUserThreadStart@8()  + 0x27 bytes   
ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes    

The crash itself will typically say something like "Unhandled exception at 0x77cd2239 (ntdll.dll) in MediaServerConsole.exe: 0xC000005: Access violation writing location 0x00000014."

Needless to say, this didn't happen when I was compiling the module as a console application.

Is there anything that I may have overlooked when converting the project over to a DLL? It's not something I've done before, so I wouldn't be at all surprised if there's something obvious I've neglected. Any help is appreciated.

Ken Smith
  • 20,305
  • 15
  • 100
  • 147
  • Have you tried running any of the suggested programs in [this topic](http://stackoverflow.com/questions/413477/is-there-a-good-valgrind-substitute-for-windows)? Purify, Insure++, etc. can help you track down subtle bugs in your program. – Adam Rosenfield Jul 22 '11 at 05:48
  • Haven't, but I'm downloading Visual Leak Detector right now to see if it can pinpoint anything. (Presumably it's not at the same level as Purify or Insure++, but it's free...) – Ken Smith Jul 22 '11 at 06:08
  • What causes the crash? An access violation? I'm assuming yes, because released heap is marked with 0xFEEFEEE. It sounds like your stack is overflowing or your heap is corrupt. Also, can you check which CRT the DLL links with (static/dynamic, debug/release) – Collin Dauphinee Jul 22 '11 at 06:16
  • 1
    My usual checklist: Select all projects in your solution and verify: C++->Optimization: Same settings for Optimization, Favor Size or Speed, and Whole Program Optimization. C++->Code Generation: Same settings for Runtime Library (Multi-threaded DLL in your case), and Struct Member Alignment. – Gnawme Jul 22 '11 at 06:19
  • @dauphic - I edited the question to add additional details. – Ken Smith Jul 22 '11 at 06:31
  • @Gnawme - Checked all those things, they all match. I've gone through all the settings I can think of, and they all seem to match, and they all *look* right - doesn't mean they are, of course :-). – Ken Smith Jul 22 '11 at 06:32
  • Do you have any global objects / static constructors in the application exe? Have you checked for the static initialization order problem? – Gangadhar Jul 22 '11 at 06:36
  • @Gagadhar - Nope, no global objects in the exe - it's as bare a host as I could make it. I don't *think* there are any static initialization order problems, or at least, there shouldn't be anything new just as the result of this change from EXE to DLL. – Ken Smith Jul 22 '11 at 07:20
  • 1
    I'd grab Application Verifier and enable full heap checking. Problems like this are always hard to track down. – Collin Dauphinee Jul 22 '11 at 08:05
  • I got it working (see below), but for grins, I tried Application Verifier. Unfortunately, the only message I got was an inscrutable one about "Invalid TLS index used for current stack trace". Googling it just turns up folks who are as confused as me. – Ken Smith Jul 22 '11 at 08:23

2 Answers2

1

I would say you are initializing memory in DLL_THREAD_ATTACH instead of DLL_PROCESS_ATTACH. The situation would cause you to use a pointer or memory that has been allocated in another thread than the executing thread.

The other thing would be to check your loading of your dependencies for the DLL.

Let me explain. The CRT does global memory allocation when your DLL is loaded with loadlibrary. This is to initialize all the global variable ranging from C primitive types that initialize them to zero as default. Then it allocates the memory for struct/classes and if necessary call's their constructors.

The CRT then calls your DLLMain method with DLL_PROCESS_ATTACH to tell the DLL that is been loaded by your process. For each thread inside that process the CRT then calls your DLL with DLL_THREAD_ATTACH.

You've said these are left empty, and then you call your exported C function. Though I can see that you're dll is getting caught in a critical section. This tells me, that you have a dead lock situation occur with your global allocated variables and your thread allocating memory within Start().

I recommend to move your initialization code within Process_Attached, this will ensure that all your memory is allocated on the main process thread, similar how the application worked as a single executable.

Chad
  • 2,938
  • 3
  • 27
  • 38
  • That was a good thing to check, but my DllMain is (as MS apparently recommends) an empty stub. I'm currently only allocating memory in a single Start() function, declared as: extern "C" __declspec (dllexport) int Start(){/* Stuff */ } – Ken Smith Jul 22 '11 at 07:12
  • Yes, but the CRT allocates global memory on a different thread. That is what I'm getting at. It looks like you're getting a deadlock when two threads are trying to access a CriticalSection. – Chad Jul 22 '11 at 07:20
  • Quite possibly I'm missing something here, then :-). If I'm not initializing anything in DllMain, how do I make sure the stuff I'm initializing in Start() gets initialized correctly/on the right thread? – Ken Smith Jul 22 '11 at 07:24
  • Well, that seems to have done it, though I'll confess I don't completely understand why. I split up the initialize() method of my main class into initialize() and start(), and moved the initialize() stuff into DllMain/DLL_Process_Attached, and the start() stuff into my exported Start() function: and that seems to work. Wish I understood why better, but I suppose that'll come with time and more brick walls :-). Thanks! – Ken Smith Jul 22 '11 at 08:06
0

I'll leave the other answer as the "accepted" answer, but it might be helpful for people to know that a key part of the problem was the fact that I was initializing libxml2 on the wrong thread. Specifically, you need to call xmlInitParser() on your main thread, before making any calls. For me, this meant:

MediaServer::MediaServer() : mProvidePolicyThread  (0),
                         mProcessCommandsThread(0),
                         mAcceptMemberThread   (0)
{
   xmlInitParser();
}

And similarly, you need to call xmlCleanupParser() when you exit:

MediaServer::~MediaServer()
{
   xmlCleanupParser();
}

This is all documented here: http://xmlsoft.org/threads.html

Ken Smith
  • 20,305
  • 15
  • 100
  • 147