Fast Matrix Multiply with SIMD

Question

Richard Startin writes about next-gen matrix multiply. But in his code, he's using 1d float[] arrays. How would I adapt his code to use 2d float[][] matrices, which are more useful to represent matrices? Or is it common to unroll such matrices to 1d array for computation?

`[i*n + k]` is pretty obviously the same as `[i][k]` for 2D indexing. The `[in + k]` indexing is just manually hoisting the `i*n` by incrementing `in += n`, so you could just do it the naive way. Using a flat 1d array is very common, though, especially in languages like C++ where `arr[][]` would make it hard to handle runtime-variable dimensions because of a lack of VLA support. IDK how the efficiency compares for Java, but as long as the allocation is one big contiguous thing, not an array of references to other arrays / containers, it's at least in theory possible to be efficient. — Peter Cordes, Nov 28 '21 at 06:53
Also note the article you looked at is *not* explicitly using SIMD, just playing around with standard scalar stuff. From the table of results manage at best about 1.5 FLOPs / clock, we can see it's not using SIMD instructions. See also [What Every Programmer Should Know About Memory?](https://stackoverflow.com/q/8126311) for more about why cache-blocked matmul matters, with some examples in an appendix including one with manual SIMD vectorization for SSE2. — Peter Cordes, Nov 28 '21 at 07:17
@PeterCordes this is some nice info, you should probably convert your comment to an answer that I could accept. Not sure why this is a duplicate? It's a different question than the index conversion one. — Ondrej Sotolar, Nov 28 '21 at 07:31
You asked how to adapt the indexing; that's the only specific question I see in your post here. That part seems pretty trivial, so I closed it as a duplicate that explains how indexing works. I don't think it's likely that many future readers of that blog article would find this Q&A based on that question. I commented to let you know that there's no SIMD happening there, so the question title isn't really appropriate. At the top of the article, he explains that he's just optimizing the pure-Java scalar implementations to give a fairer comparison for stuff like the SIMD paper he linked. — Peter Cordes, Nov 28 '21 at 07:43

Fast Matrix Multiply with SIMD

0 Answers0