15

I don't quite understand what makes matrix multiplication in C#/.NET (and even Java) so slow.

Take a look at this benchmark (source): Trying to find an updated benchmark.

Java vs C# vs C++ breakdown

C#'s integer and double performance is damn close to C++ compiled with MSVC++. 87% as fast for double and 99% as fast for 32-bit integer. Pretty damn good, I'd say. But then look at matrix multiplication. The gap widens to C# being about 19% as fast. This is a pretty huge discrepancy that I don't understand. Matrix multiplication is just a bunch of simple math. How is it getting so slow? Shouldn't it be roughly as fast as an equivalent number of simple floating point or integer operations?

This is especially of a concern with games and with XNA, where matrix and vector performance are critical for things like physics engines. Some time ago, Mono added support for SIMD instructions through some nifty vector and matrix classes. It closes the gap and makes Mono faster than hand-written C++, although not as fast as C++ with SIMD. (source)

Matrix multiplication comparison

What's going on here?

Edit: Looking closer, I misread the second graph. C# appears pretty close. Is the first benchmark just doing something horribly wrong? Sorry, I missed the version number on the first benchmark. I grabbed it as a handy reference for the "C# linear algebra is slow" that I've always heard. I'll try to find another.

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
Matthew Olenik
  • 3,577
  • 1
  • 28
  • 31
  • 2
    C# Version + Options: .Net Framework 1.1.4322 Uh... isn't there a newer version? – GalacticJello Jul 12 '10 at 14:45
  • 5
    *sits and waits to see what JonSkeet has to say on the matter* :-) – WestDiscGolf Jul 12 '10 at 14:49
  • The test was done with VS 2003. (Note the C++ version as well.) Hence the ancient version of .net. – cHao Jul 12 '10 at 14:54
  • @GalacticJello yes but even still, why is it so much slower than integer/floating point operations? I'll see if I can find some more benchmarks. – Matthew Olenik Jul 12 '10 at 14:59
  • @Matt Olenick: That was the first (well... second) version of .NET. Moreso, XNA won't even run on .NET 1.1, it's an implementation of .NET 2.0 Compact Edition. Believe it or not, improvements are made in languages following earlier releases. – Powerlord Jul 12 '10 at 15:15
  • @Matt: Good call. Benchmarks from 6 years ago -- and for stone-age versions of the frameworks/languages involved -- are pretty much useless. – cHao Jul 12 '10 at 15:15
  • Sigh. Yes, I missed the version number, I see that. I *know* they make improvements, no need to be a smartass. I'll try to find something newer. – Matthew Olenik Jul 12 '10 at 15:34
  • @cHao: A lot of times, the underlying implementation doesn't change over time that much. The lower you go, the less often there are changes. IL/ByteCode is pretty low level. The question is also not useless if you're still using these frameworks. ( COBOL programmers are still in demand, after all ) – Armstrongest Jul 12 '10 at 15:44
  • @Gary '-': Well, in this case, it has. .net 2.0 was pretty much an overhaul. They added a bunch of stuff, including generics (read: a better way to create collection classes...). 4.0 upgraded the CLR too, but i forget what all they added. – cHao Jul 12 '10 at 16:58
  • @cHao: You could be right cHao... I agree that it was an overhaul of .Net but I think performance of something relatively simple like this would sit in the IL level which I doubt had much work done to it (though I could be wrong). – Armstrongest Jul 12 '10 at 21:59
  • @Gary '-': Generics required support from the ground up, so i'm pretty sure the IL translator / JIT was at least touched. Even if it weren't, IL is just semi-digested code. The CLR itself (which saw some pretty big changes) is what actually runs it. – cHao Jul 12 '10 at 22:21

4 Answers4

14

With large matrices like this, the CPU cache becomes the limiting factor. What's hyper-important is how the matrix is stored. And the benchmark code is comparing apples and oranges. The C++ code used jagged arrays, the C# code uses two-dimensional arrays.

Rewriting the C# code to use jagged arrays as well doubled its speed. Rewriting the matrix multiply code to avoid the array index boundary check seemed pointless, nobody would use code like this for real problems.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • Thanks, that clears things up. So why do I always hear (among other reasons) "XNA is slow because matrix multiplication in C# is slow"? Is that just not true? – Matthew Olenik Jul 12 '10 at 15:46
  • 1
    I dunno, that's an unverifiable claim from where I sit. Do XNA programmers often write their own matrix multiplying code? C/C++ code is uncompromising when it comes to speed and you're left picking shrapnel out of your ears when it blows up. If there's a speed problem with a particular algorithm in C# then you always have C/C++ to fall back on. – Hans Passant Jul 12 '10 at 16:36
  • 1
    No, they use the library provided by XNA. – Matthew Olenik Jul 12 '10 at 17:29
11

To explain the origin of the idea that XNA matrix operations are slow:

First of all there's the beginner-level gotcha: The XNA Matrix class's operator* will make several copies. This is slower than what you might expect from the equivalent C++ code.

(Of course, if you use Matrix.Multiply(), then you can pass by reference.)

The second reason is that the .NET Compact Framework used by XNA on the Xbox 360 does not have access to the VMX hardware (SIMD) that is available to native, C++ games.

This is why you keep hearing that it is slow, at least. As you can see from the benchmarks you posted - it's not really that "slow", when you compare apples to apples.

Andrew Russell
  • 26,924
  • 7
  • 58
  • 104
8

Well clearly the benchmark author did not understand the difference between jagged and multidimensional arrays in C#. It really was not an apples-to-apples to comparison. When I changed the code to use jagged arrays instead of multidimensional arrays so that it operates in a manner more similar to Java then the C# code ends up running twice as fast...making it faster than Java (though just barely and that is probably statistically insignificant). In C# multidimensional arrays are slower because there is extra work involved in finding the array slot and because the array bounds check cannot be eliminated for them...yet.

See this question for a more in depth analysis of why multidimensional arrays are slower than jagged arrays.

See this blog for more information on array bounds checking. The article specifically warns against using multidimensional arrays for matrix multiplication.

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
Brian Gideon
  • 47,849
  • 13
  • 107
  • 150
4

Here's an updated benchmark dealing with matrix multiplcation (and some benchmarks using the new Task Parallel Library):

Parallel Matrix Multiplication with the Task Parallel Library (TPL)

The article goes into different methods, and explains why multidimensional arrays are a poor choice:

The easiest way to do matrix multiplication is with a .NET multidimensional array with i,j,k ordering in the loops. The problems are twofold. First, the i,j.k ordering accesses memory in a hectic fashion causing data in varied locations to be pulled in. Second, it is using a multidimensional array. Yes, the .NET multidimensional array is convenient, but it is very slow.

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
GalacticJello
  • 11,235
  • 2
  • 25
  • 35