8

My current task is to optimise a Monte Carlo Simulation that calculates Capital Adequacy figures by region for a set of Obligors.

It is running about 10 x too slow for where it will need to be in production and number or daily runs required. Additionally the granularity of the result figures will need to be improved down to desk possibly book level at some stage, the code I've been given is basically a prototype which is used by business units in a semi production capacity.

The application is currently single threaded so I'll need to make it multi-threaded, may look at System.Threading.ThreadPool or the Microsoft Parallel Extensions library but I'm constrained to .NET 2 on the server at this bank so I may have to consider this guy's port, http://www.codeproject.com/KB/cs/aforge_parallel.aspx.

I am trying my best to get them to upgrade to .NET 3.5 SP1 but it's a major exercise in an organisation of this size and might not be possible in my contract time frames.

I've profiled the application using the trial of dotTrace (http://www.jetbrains.com/profiler). What other good profilers exist? Free ones?

A lot of the execution time is spent generating uniform random numbers and then translating this to a normally distributed random number. They are using a C# Mersenne twister implementation. I am not sure where they got it or if it's the best way to go about this (or best implementation) to generate the uniform random numbers. Then this is translated to a normally distributed version for use in the calculation (I haven't delved into the translation code yet).

Also what is the experience using the following?

Any alternatives you know of? I'm a C# developer so would prefer C#, but a wrapper to C++ shouldn't be a problem, should it?

Maybe even faster leveraging the C++ implementations. I am thinking some of these libraries will have the fastest method to directly generate normally distributed random numbers, without the translation step. Also they may have some other functions that will be helpful in the subsequent calculations.

Also the computer this is on is a quad core Opteron 275, 8 GB memory but Windows Server 2003 Enterprise 32 bit. Should I advise them to upgrade to a 64 bit OS? Any links to articles supporting this decision would really be appreciated.

Anyway, any advice and help you may have is really appreciated.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
m3ntat
  • 3,635
  • 11
  • 38
  • 50
  • 2
    Why do you believe that throwing more threads at the problem will improve matters? – Eric Lippert Jul 06 '09 at 18:29
  • At present the code is single threaded running on a quad core box, Opteron 275 to be precise. The code is written to run sequentially, either the compiler or the CLR or the CPU instruction set can take it's best guess at how to take this code and attempt to run portions of it in parallel to improve performance. Or I can write this code to run in a threaded parallel model so effectively suggesting to the CLR, compiler, CPU what can be run concurrently and let these lower level instructions then optimise execution. Your thoughts? – m3ntat Jul 06 '09 at 19:08
  • 1
    8GB of memory is 4 GB of waste on 32 bits... – Not Sure Jul 06 '09 at 19:34
  • Anyone have any insight into the Parallel Extension library, anyone used Parallel.For the Microsoft version vs this http://www.codeproject.com/KB/cs/aforge_parallel.aspx opensource version also am looking for someone good with threads to weigh in and compare with the .net Threading options? because I am new to multi-threaded programming – m3ntat Jul 06 '09 at 22:32

4 Answers4

4

I have found the Mersenne Twister to be quick. The problem may be in the algorithm (Box-Muller) to transform the uniform distrubution to Gaussian distribution. The standard algorithm looks like:

y1 = sqrt( - 2 ln(x1) ) cos( 2 pi x2 )
y2 = sqrt( - 2 ln(x1) ) sin( 2 pi x2 )

Where x1 and x2 are uniform random numbers and y1 and y2 are the gaussian distribution outputs.

The square roots are slow, but the trig is worse, and it is unstable close to 0. Taygeta's page on the subject gives a faster one (in pseudocode):

         float x1, x2, w, y1, y2;

     do {
             x1 = 2.0 * ranf() - 1.0;
             x2 = 2.0 * ranf() - 1.0;
             w = x1 * x1 + x2 * x2;
     } while ( w >= 1.0 );

     w = sqrt( (-2.0 * ln( w ) ) / w );
     y1 = x1 * w;
     y2 = x2 * w;

If they're not using something like this, you may be able to speed things up quite a bit by avoiding the trig functions or even pre-generating the random numbers.

R Ubben
  • 2,225
  • 16
  • 8
  • As a note, many modern processors have an assembly instruction for simultaneous calculation of sin and cos, and it is much cheaper than calling both sequentially. It's not available in any standard libraries, afaik, since it is a processor-specific feature. – Not Sure Jul 06 '09 at 19:31
  • Thanks @R Ubben, is your proposal the same as this http://en.wikipedia.org/wiki/Box-Muller_transformation or is this something different? – m3ntat Jul 07 '09 at 09:03
  • Yes, the polar form they describe. It is rejection sampling, so you throw some numbers away, but it still ends up much faster. Though I also work in banking, I was doing this for enjoyment - global illumination in a ray tracer. It did make a difference. If speed is still a problem, you could generate several hundred million offline between daily runs, depending on how many the run uses, and just read them in as needed. Then fall back to generation if the store is exhausted. – R Ubben Jul 07 '09 at 11:04
1

Have you considered pointing a profiler at your code? I've seen cases where there are simple fixes get very significant improvements. Like switching a couple of properties over to fields.

Jake Pearson
  • 27,069
  • 12
  • 75
  • 95
0

Being constrained to use .Net in the first place for a large-scale simulation is going to cost you quite a bit of performance right up front...but that said...

If you're running a pure C# implementation of the Mersenne Twister, it's likely that you'll have a hard time tweaking all the performance you can out of it. If you check out the Mersenne Twister reference implementation you'll see they have a C version that is heavily optimized for SSE-capable processors - this is very fast. I don't believe it's possible in C# (or at least, I'm not aware how) to force the usage of SSE instructions with that level of optimization. I'd suggest writing a C++/CLI wrapper (or a P/Invoke wrapper) around the Mersenne Twister libraries, and seeing how that affects your performance. However, you'll have to be careful with managed-unmanaged marhsalling affecting your performance, as I have seen other posts here on SO about that issue (though I can't seem to find them right now...).

I may generate some flame for saying this, but if performance is a significant concern in your application, well-written C or C++ is almost always going to be preferable to any managed or interpreted language.

Not Sure
  • 5,873
  • 3
  • 23
  • 29
  • 2
    I actually disagree fairly strongly with your last statement... C# can perform very, very well. It does require profiling, but in my experience, it's much easier to profile C# and improve it to the point where it can outperform C and C++ - especially if you take advantages of the right libraries, and understand how to write tight, highly performant code in C#. – Reed Copsey Jul 06 '09 at 17:45
  • 1
    I want to constructively disagree, too :) – peterchen Jul 06 '09 at 17:57
  • 1
    @Reed - Let me clarify - I'm not talking about the ease of profiling, nor the tools available, nor the difficulty of optimizing. I'm asserting that for any program written in an interpreted or managed language, it can be proven that a functionally equal program with equal or better performance can be written in an unmanaged language. – Not Sure Jul 06 '09 at 19:27
0

My experience is that the relative performance of C# vs. C++ is largely dependent on what you're doing. A great discussion of that here:

C++ performance vs. Java/C#

For tight loops doing math (say vector physics calculations) c++ is a 2-3 times faster than C# although the perf may be dominated by the underlying functions like Sqrt().

I've taken a mixed language approach, (re)implementing the slowest code in C++/OpenMP with a managed C++/CLI wrapper. This allows you to only "pay for what you use".

There's a summary of how to wrap native C/C++ with C++/CLI here:

http://msdn.microsoft.com/en-us/library/ms235281.aspx

Once you get the hang of C++/CLI it's pretty easy to get things running.

Community
  • 1
  • 1
Ade Miller
  • 13,575
  • 1
  • 42
  • 75