10

The introduction - the long and boring part

(The question is at the end)

I am getting severe head aches over a third party COM component that keeps changing the FPU control word.

My development environment is Windows and Visual C++ 2008. The normal FPU control word specifies that no exceptions should be thrown during various conditions. I have verified this with both looking at the _CW_DEFAULT macro found in float.h, as well as looking at the control word in the debugger at startup.

Everytime I make a call into the COM object, the control word is modified upon return. This is easy to defend against. I simply reset the control word, and all is good. The problem is when the COM component starts calling my event sink. I can protect my code by reseting the control word as soon as I receive the event call, but I can't do anything as soon as I return from the event call.

I don't have the source for this COM component, but I am in contact with the author. The responses I have had from him has been "Huh?". I don't think he has the slightest clue what I'm talking about, so I fear I have to do something about this myself. I believe that his runtime (I think it's either Delphi or Borland C++, because the DLL is full of symbol names, all starting with capital T) , or some other third party code he's using, that's causing the problem. I don't think his code explicitly modifies the FPU control word.

So, what can I do? From a business point of view, it is imperative to use this third party component. From a technical point of view, I could ditch it, and implement the communication's protocol myself. However, that would be really expensive, as this protocol involves handling credit card transactions. We don't want to take on the liability.

I desperately need a hack-around, or some useful information about FPU settings in Borland products that I can pass along to the author of the component.

The questions

Is there anything I can do? I don't think the component author has what it takes to fix it (by judging from his rather clueless responses).

I have been toying with the idea of installing my own exception handler, in which I just reset the control word in the handler, and tell Windows to continue executing. I tried installing the handler with SetUnhandledExceptionFilter(), but for some reason, the exceptions are not caught.

  1. Why aren't I catching the exceptions?
  2. If I succeed with catching FPU exceptions, resetting the FPU control word, and just let the execution continue as nothing has happened - are all bets off then?

Update

I would like to thank everyone for their suggestions. I have sent the author instructions on what he can do to make life easier for not just me, but many other clients of his code. I suggested to him that he should sample the FPU control word at DllMain(DLL_PROCESS_ATTACH), and save the control word for later, so that he can reset FPU CW before calling my event handlers, and returning from my calls.

For now, I have a hack-around if anyone is interested. The hack-around is potentially a bad one, because I don't know what it'll do to his code. I have received confirmation earlier that he does not use any floating point numbers in his code, so this should be safe, barring some third party code he uses, that relies on FPU exceptions.

The two modifications I have made to my app:

  1. Wrap my message pump
  2. Install a window hook (WH_CALLWNDPROC) to catch the corner cases where the message pump is bypassed

In both instances, I check if the FPU CW has changed. If it has, I reset it to _CW_DEFAULT.

Community
  • 1
  • 1
Jörgen Sigvardsson
  • 4,839
  • 3
  • 28
  • 51
  • 2
    http://www.virtualdub.org/blog/pivot/entry.php?id=53; long story short, they too added code to restore the FPU control world after any "dangerous" code path, but the post mentions that "It is possible to disable this behavior of the Borland run-time library and avoid this problem" – Matteo Italia Aug 03 '11 at 21:55
  • @matteo, thank you for the link. An interesting read! I will forward it to the author. – Jörgen Sigvardsson Aug 03 '11 at 22:14
  • By the way, I think that the way to disable that behavior mentioned in that post is actually using the Set8087CW function, as explained by @David in his answer. – Matteo Italia Aug 03 '11 at 22:23

2 Answers2

6

I think your diagnosis that the component is written in an Embarcadero product is very likely to be true. Delphi's runtime library does indeed enable floating point exceptions, same for C++ Builder.

One of the nice things about Embarcaderos tools is that floating point errors get converted into language exceptions which makes numerical coding a lot easier. That is going to be of little consolation to you!

This entire area is a colossal PITA. There are no rules whatsoever regarding the FP controls word. It's a total free-for-all.

I don't believe that catching unhandled exceptions isn't going to get the job done because the MS C++ runtime will presumably already be catching these exceptions, but I'm no expert in that area and I may be wrong.

I believe that your only realistic solution is to set the FPU to what you want it to be whenever execution arrives in your code, and restore it when execution leaves your code. I don't know enough about COM event sinks to understand why they present an obstacle to doing this.

My product includes a DLL implemented in Delphi and I suffer from the reverse problem. Mostly the clients that call in have an FPU control word that disables exceptions. The strategy we adopt is to remember the 8087CW on entry, set it to the standard Delphi CW before executing code, and then restore it at the exit point. We take care to deal with callbacks too by restoring the caller's 8087CW before making the callback. This is a plain DLL rather than a COM object so it's probably a bit simpler.

If you decide to attempt to get the COM supplier to modify their code then they need to call the Set8087CW() function.

However, since there are no rules to the game, I believe that the COM object vendor would be justified in refusing to change their code and to put the onus back on you.

Sorry if this is not a 100% conclusive answer, but I couldn't get all these thoughts into a comment!

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • Thanks for the information! I hope the other guy is willing to help me out, because I'm generating revenue for him. I guess that if he implements a new interface, say `IFPUControlWordPolicy`, through which I can register my preferred CW, then he can make sure that the CW is set to what I want it to be, and restore it to his preferred CW upon incoming calls, etc. If it's that simple. There could be runtime code that is run behind "his scenes", that modifies the FPU CW. He claims that he isn't do anything to the FPU CW, and somehow the FPU CW is modified... – Jörgen Sigvardsson Aug 03 '11 at 22:26
  • ...it seems that the CW is modified when I return from an event sink. I suspect he's generating the events asynchronously (i.e., they are not generated as a result of my calls into his component), and the CW is modified then. I cannot defend myself against this, because in this scenario, my code is not on the call stack at that point. :/ – Jörgen Sigvardsson Aug 03 '11 at 22:29
  • emba runtime most definitely sets CW. He won't do it explicitly. His code may rely of FP exceptions though. – David Heffernan Aug 03 '11 at 22:37
  • Is the event sink run out of a message loop? – David Heffernan Aug 03 '11 at 22:43
  • My event sink sits in the same STA as the 3rd party object. I do know though that the component spawns a comm thread, that I *think* is also sending events. Are you thinking what I'm thinking? Check the CW after each GetMessage() in my message loop, and resetting the exception bits if they have changed? Wouldn't that incur a big performance penalty? Sorry for my spurious ramblings... I'm getting desperate. :) – Jörgen Sigvardsson Aug 03 '11 at 23:06
  • +1 for "no rules, it's a total free-for-all." This is unfortunate, but a pretty accurate description. – Mason Wheeler Aug 03 '11 at 23:58
  • checking control word is quick, much quicker than a call to GetMessage – David Heffernan Aug 04 '11 at 06:27
  • @David: thanks mate. I ended up wrapping my message pump and installing a window hook (WM_CALLWNDPROC), and reset the FP if need be. I did some profiling of reading the FPU CW, and it was "just" 28 times slower than copying one 32 bit value between two registers. That's fast enough for me. Setting the FPU CW was 2.5 times slower, but it doesn't occur that frequently, so I am good with this. This is a hack that seems to work for now - but I have contacted the author with complete information on how to maintain the FPU CW state between calls. That way both my and his code will run as expected. – Jörgen Sigvardsson Aug 04 '11 at 08:14
6

Although FP control word is per-thread, dllmain functions are called when new threads are created, i don't think you can avoid this short of going to a new process.

I suggest you spin-off a new process to run the COM and chat with the process with your favorite inter-process communication method (e.g. windows message, out-of-proc COM, named pipe, socket, etc). In this way the COM server is free to do all kinds of damage (including crashing itself) without bring your host process down.

Another idea is to write a DLL whose sole purpose is to reset the FPU in its DllMain and load it immediately after the offending DLL. Windows is probably using the loading order to call DllMain when new threads are created, including threads created by the COM server. Note this depends on Windows's internal behavior. Also the COM server may actually depends on fp exceptions since it enables them. Disabling FP exceptions may causes the COM server to behave unexpectedly.

Sheng Jiang 蒋晟
  • 15,125
  • 2
  • 28
  • 46