4

I am working on a piece of software that needs to call a family of optimisation solvers. Each solver is an auto-generated piece of C code, with thousands of lines of code. I am using 200 of these solvers, differing only in the size of optimisation problem to be solved.

All-in-all, these auto-generated solvers come to about 180MB of C code, which I compile to C++ using the extern "C"{ /*200 solvers' headers*/ } syntax, in Visual Studio 2008. Compiling all of this is very slow (with the "maximum speed /O2" optimisation flag, it takes about 8hours). For this reason I thought it would be a good idea to compile the solvers into a single DLL, which I can then call from a separate piece of software (which would have a reasonable compile time, and allow me to abstract away all this extern "C" stuff from higher-level code). The compiled DLL is then about 37MB.

The problem is that when executing one of these solvers using the DLL, execution requires about 30ms. If I were to compile only that single one solvers into a DLL, and call that from the same program, execution is about 100x faster (<1ms). Why is this? Can I get around it?

The DLL looks as below. Each solver uses the same structures (i.e. they have the same member variables), but they have different names, hence all the type casting.

extern "C"{
#include "../Generated/include/optim_001.h"
#include "../Generated/include/optim_002.h"
/*etc.*/
#include "../Generated/include/optim_200.h"
}

namespace InterceptionTrajectorySolver
{

__declspec(dllexport) InterceptionTrajectoryExitFlag SolveIntercept(unsigned numSteps, InputParams params, double* optimSoln, OutputInfo* infoOut)
{
  int exitFlag;

  switch(numSteps)
  {
  case   1:
    exitFlag = optim_001_solve((optim_001_params*) &params, (optim_001_output*) optimSoln, (optim_001_info*) &infoOut);
    break;
  case   2:
    exitFlag = optim_002_solve((optim_002_params*) &params, (optim_002_output*) optimSoln, (optim_002_info*) &infoOut);
    break;
  /*
    ...
    etc.
    ...
  */
  case   200:
    exitFlag = optim_200_solve((optim_200_params*) &params, (optim_200_output*) optimSoln, (optim_200_info*) &infoOut);
    break;
  }

  return exitFlag;
};

};
mwmwm
  • 195
  • 1
  • 12
  • On which platform do you observe that? On Linux with 32 bits architecture, `.so` files need `-fPIC` which eats one register, so code might run 5% slower (because the compiler spills more). – Basile Starynkevitch Sep 07 '12 at 07:35
  • 1
    Post mentions Visual Studio and DLLs, which says Windows. – themel Sep 07 '12 at 07:36
  • @Basile, themel: Yes, it's all on Windows, compiled with VS2008. – mwmwm Sep 07 '12 at 07:45
  • But it might matter if it is Windows for 32 bits (which probably also reserve an additional register for compiling DLL code) or for Windows 64 bits. IT might be related to ABI conventions on your Windows system (which are different for 32 and 64 bits systems). – Basile Starynkevitch Sep 07 '12 at 07:52
  • @Basile: It's all 32bit. Edit: sorry, to be clear: I'm compiling Win32, but I'm running on 64bit Windows 7. – mwmwm Sep 07 '12 at 07:56
  • @BasileStarynkevitch: 32-bit Windows does no such thing. – jalf Sep 07 '12 at 08:03
  • What are you measuring? The total execution time of the entire program, from it is loaded until it's terminated? Or are you timing the specific call to your solver function? – jalf Sep 07 '12 at 08:05
  • I'm measuring the execution time for the `SolveIntercept(..)` function, i.e. just the call to the dll's exported function. – mwmwm Sep 07 '12 at 11:53

3 Answers3

1

I do not know if your code is inlined into each case part in the example. If your functions are inline functions and you are putting it all inside one function then it will be much slower because the code is laid out in virtual memory, which will require much jumping around for the CPU as the code is executed. If it is not all inlined then perhaps these suggestions might help.

Your solution might be improved by...

A) 1) Divide the project into 200 separate dlls. Then build with a .bat file or similar. 2) Make the export function in each dll called "MyEntryPoint", and then use dynamic linking to load in the libraries as they are needed. This will then be the equivalent of a busy music program with a lot of small dll plugins loaded. Take a function pointer to the EntryPoint with GetProcAddress.

Or...

B) Build each solution as a separate .lib file. This will then compile very quickly per solution and you can then link them all together. Build an array of function pointers to all the functions and call it via lookup instead.

result = SolveInterceptWhichStep;

Combine all the libs into one big lib should not take eight hours. If it takes that long then you are doing something very wrong.

AND...

Try putting the code into different actual .cpp files. Perhaps that specific compiler will do a better job if they are all in different units etc... Then once each unit has been compiled it will stay compiled if you do not change anything.

0

Make sure that you measure and average the timing multiple calls to the optimizer, because it could be that there's a large overhead to the setup before the first call.

Then also check what that 200-branch conditional statement (your switch) is doing to your performance! Try eliminating that switch for testing, calling just one solver in your test project but linking all of them in the DLL. Do you still see slow performance?

Joris Timmermans
  • 10,814
  • 2
  • 49
  • 75
  • I'm doing an average over 100 calls, but there seems to be no noticeable difference for the first call. I'll try with linking all solvers, but removing the switch -- unfortunately, this test will involve compiling/code generation for the full project, so it'll take a day to do. – mwmwm Sep 07 '12 at 08:29
0

I assume the reason you are generating the code is for better run-time performance, and also for better correctness. I do the same thing.

I suggest you try this technique to find out what the run-time performance problem is.

If you're seeing a 100:1 performance difference, that means each time you interrupt it and look at the program's state, there is a 99% chance you will see what the problem is.

As far as build time goes, sure it makes sense to modularize it. None of that should have much effect on run time, unless it means you're doing crazy I/O.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135