What is the better Matrix4x4 class design c++ newbie

Question

What would be better to use as a way to store matrix values?

float m1,m2,m3 ... ,m16

or

float[4][4].

I first tried float[16] but when im debugging and testing VS wont show what is inside of the array :( could implement a cout and try to read answer from a console test application.

Then i tried using float m1,m2,m3,etc under testing and debugging the values could be read in VS so it seemed easier to work with.

My question is because im fairly new with c++ what is the better design?

I find the float m1,m2 ... ,m16 easier to work with when debugging.

Would also love if someone could say from experience or has benchmark data what has better performance my gut says it shouldn't really matter because the matrix data should be laid out the same in memory right?

Edit: Some more info its a column major matrix. As far as i know i only need a 4x4 Matrix for the view transformation pipeline. So nothing bigger and so i have some constant values.

Busy writing a simple software renderer as a way to learn more c++ and get some more experiences and learn/improve my Linear algebra skills. Will probably only go to per fragment shading and some simple lighting models and so far that i have seen 4x4 matrix is the biggest i will need for rendering.

Edit2: Found out why i couldn't read the array data it was a float pointer i used and debugging menu only showed the pointer value i did discover a way to see the array value in watch where you have to do pointer, n where n = the element you want to see.

Everybody that answered thanks i will use the Vector4 m[4] answer for now.

You can customize autoexp.dat to instruct debugger to show what you need for your matrix. That said...best implementation (from performance point of view) can't be decided without more details. How will you use that matrix? Which operations you'll apply? Which algorithms will you support? Often "splitted" variables are pretty good but...if it's really a bottleneck...profile profile profile and then measure. — Adriano Repetti, Apr 04 '13 at 11:22
I will recheck it again maybe i have done some stupid stuff. New to native programming and C++ and the Environment. — Thubie de Jong, Apr 04 '13 at 12:22

Brett Hale · Accepted Answer · 2013-04-04T11:51:34.110

2

You should consider a Vector4 with float [4] members, and a Matrix4 with Vector4 [4] members. With operator [], you have two useful classes, and maintain the ability to access elements with: [i][j] - in most cases, the element data will be contiguous, provided you don't use virtual methods.

You can also benefit from vector (SIMD) instructions this way, e.g., in Vector4

union alignas(16) { __m128 _v; float _s[4]; }; // members

inline float & operator [] (int i) { return _s[i]; }
inline const float & operator [] (int i) const { return _s[i]; }

and in Matrix4

Vector4 _m[4]; // members

inline Vector4 & operator [] (int i) { return _m[i]; }
inline const Vector4 & operator [] (int i) const { return _m[i]; }

edited Apr 04 '13 at 11:51

answered Apr 04 '13 at 11:24

Brett Hale

21,653
2
61
90

Yes, I deleted my comment after having a harder think about it. Good work :) – Wayne Uroda Apr 04 '13 at 11:56
If you allow SIMD support, don't allow access to individual elements. It's a huge performance hazard, choose more SIMD friendly ways of getting and setting individual elements though swizzle operations and selects. – Jasper Bekkers Apr 04 '13 at 12:13
@JasperBekkers - it just requires judicious use. i.e., dot products for vector transforms. I'd rather have both options available. – Brett Hale Apr 04 '13 at 12:19
I will keep this in mind when i refactor the code i do have a Vector4 class. But im new to C++ and the working environment so have completely no experience with SSE instructions. – Thubie de Jong Apr 04 '13 at 12:26
@ThubiedeJong - you can ignore the SSE stuff and just use a `float[4]` member. It's there for 'batch' processing if you can keep the SIMD units well-fed. – Brett Hale Apr 04 '13 at 12:27
Now that i think harder about it, it seems this is a good solution just a array with Vector4 [4] Should simplify some of the matrix math to. – Thubie de Jong Apr 04 '13 at 13:13
@BrettHale Yeah both options is nice, that's why we put them in separate classes with easy conversion options. At least you know you'll be doing something stupid ;-) – Jasper Bekkers Apr 04 '13 at 15:11

score 1 · Answer 2 · answered Apr 04 '13 at 11:35

1

The float m1, m2 .. m16; becomes very awkward to deal with when it comes to using loops to iterate through things. Using arrays of some sort is much easier. And, most likely, the compiler will generate AT LEAST as efficient code when you use loops as if you "hand-code", unless you actually write inline assembler or use SSE intrinsics.

answered Apr 04 '13 at 11:35

Mats Petersson

126,704
14
140
227

A rule of thumb: `X m1, m2;` is okay, `X m1, m2, m3, m4;` (and more) must be turned into an array. The case of `X m1, m2, m3` is uncertain. – Joker_vD Apr 04 '13 at 12:02

score 0 · Answer 3 · answered Apr 04 '13 at 11:24

0

The 16 float solution is fine as long as the code doesn't evolve (it is a hassle to maintain and it is not really readable)

The float[4][4] is a way better design (in terms of size parametrization) but you have to understand the notion of pointers.

answered Apr 04 '13 at 11:24

lucasg

10,734
4
35
57

Define "fine"! What if you wanted to iterate over the elements? – Oliver Charlesworth Apr 04 '13 at 11:35
@OliCharlesworth I suppose he means "fine according to your question" (**performance**). If matrix size is fixed and that code must be really optimized then direct variable access is **much more faster** than any (double...) pointer indirection (and that's why LL 3X3 and 4X4 matrices are implemented like that). **It doesn't mean it's readable**, _performance isn't a good friend of readability_. – Adriano Repetti Apr 04 '13 at 11:39
Fair point. Depends on the use case. ( and what I mean by "fine solution" is " a small time to develop when you are a beginner") – lucasg Apr 04 '13 at 11:43
float[4][4] isn't necessarily double indirection, right? AFAIK the compiler actually allocates a 16 element array - actually I am not so sure about that, I can't find a good reference. – Wayne Uroda Apr 04 '13 at 11:43
1

@Adriano: The compiler should be able to generate identical code for accessing an array with a literal index (e.g. `this->m[1][2]`) and accessing a named member (e.g. `this->m6`). So using an array shouldn't hurt performance for manual unrolling. And it gives you the option to programmatically iterate if you want to... – Oliver Charlesworth Apr 04 '13 at 11:48
@WayneUroda yes, as I commented in fatih_k answer perfromance (of float[][] and variables) _may_ be comparable (well if the compiler is good enough) for a very naive implementation. That said it depends a lot on the algorithms he'll apply to them. – Adriano Repetti Apr 04 '13 at 11:50
@OliCharlesworth if he's writing a class instead of exposing float[][] why do you suppose he'll expose **internal implementation**? Moreover are you sure the **compiler can optimize the access to array elements** if it's done through a function (m4.get(0, 0) * m4.get(0, 1) will be optimized by the compiler? probably **not**). Assumed it can't do it the difference is only inside the class and for iteration. Both can be solved using an **union** of that variables with an array (so you get fast access from outside, from inside, iteration and **SSE-ready** array too). – Adriano Repetti Apr 04 '13 at 12:01
@Adriano: I'm not really thinking of internal vs. external. I'm not sure it makes a big difference to what I'm saying... – Oliver Charlesworth Apr 04 '13 at 12:03
I am almost 100% certain that the compiler will access the element of a static 2d array in exactly the same manner as a field of the class, since at compile time both memory offsets are known. I am going to check this when I get a chance. – Wayne Uroda Apr 04 '13 at 12:04
@OliCharlesworth I mean that the compiler won't optimize anything in array access when it's done through a variable. With multiple variables it knows addresses at compile time, with variables it must calculate them at run-time. – Adriano Repetti Apr 04 '13 at 12:05
you guys are sick :) The OP does not know how to debug a simple matrix are here you are talking about pointer indirection and internal/external ! – lucasg Apr 04 '13 at 12:06
codinghorror himself says this is just a massively multiplayer game for people who like typing paragraphs to each other ;) – Wayne Uroda Apr 04 '13 at 12:10
@WayneUroda LOL today we had too much free time! – Adriano Repetti Apr 04 '13 at 12:26
Big chance that the code wont really change much, after im done with it. This is just a project i'm doing for my minor after that i will maybe make it public on github for friends to code review if they want. This project is done more to learn more about the math and c++ – Thubie de Jong Apr 04 '13 at 12:50
1

I wrote two test classes, one using a static 2d array of floats, another using 16 float variables. The compiler generates *identical* assembly for the matrix multiply functions of each - this is using friend access for the elements. Using an inline accessor function and -O1, the 2d array beats the 16 floats by an un-needed push-pop %ebx. With -O2 the 2d array beats the 16 floats by 1 operation. The inline accessor for the 16 floats looks like a big case statement - the inline accessor for the 2d array is just return m[x][y]. The array version also lets you use more natural [row][col] notation. – Wayne Uroda Apr 04 '13 at 13:00
so @Adriano, "are you sure the compiler can optimize the access to array elements if it's done through a function (m4.get(0, 0) * m4.get(0, 1) will be optimized by the compiler? probably not)" - yes, with optimisation on it does, I've seen it with my own eyes :) – Wayne Uroda Apr 04 '13 at 13:02
@adriano: Re "the compiler won't optimize anything in array access when it's done through a variable." Sure, but then you're not really making a fair comparison. With individual variables, there is no way to programmatically index. – Oliver Charlesworth Apr 04 '13 at 14:25
@WayneUroda just to wrote few paragraphs more ;) sure it doesn't simply inline a constant value? If the offset from the base pointer is a function parameter then it can't replace it with a value at compile time simply because it doesn't know it (imagine you use your matrix in a library, not a fake example). – Adriano Repetti Apr 04 '13 at 16:50
To be clear, pseudo C: union { struct { float m11; float m12; ... } d; float v[3]; } – Adriano Repetti Apr 04 '13 at 16:57
@Adriano: But why? How does that help? If you're hiding the internal implementation behind member functions, then it's irrelevant anyway. And if you're not (or in the case of private functionality), the array already compiles to optimal machine code. – Oliver Charlesworth Apr 04 '13 at 16:57
@OliCharlesworth just to provide a quick way for enumeration (or to access via linear index or to use array with SSE intrinsics), it won't be exposed externally (and even not used internally besides that, direct variable access whenever needed with inlined getters and array access only for enumeration and SSE). But I agree with georgesl: these are details noone cares and such micro optimization that probably can safely be ignored! :) – Adriano Repetti Apr 04 '13 at 17:02
@Adriano Yes, this is with constant offsets into the array. I don't know anyone who would use non-constant offsets in a 4x4 matrix if they were worried about speed. I know it is beating a dead horse, I'm just saying I don't believe there is a single advantage of 16 separate floats over a `float[4][4]`. There is no need to have a union, you can bypass the separate floats altogether! :) – Wayne Uroda Apr 04 '13 at 22:00

score 0 · Answer 4 · edited May 23 '17 at 12:21

I would use an array of 16 floats like float m[16]; with the sole reason being that it is very easy to pass it to a library like openGL, using the Matrix4fv suffix functions.

A 2D array like float m[4][4]; should also be configured in memory identically to float m[16] (see May I treat a 2D array as a contiguous 1D array?) and using that would be more convenient as far as having [row][col] (or [col][row] I am not sure which is correct in terms of openGL) indexing (compare m[1][1] vs m[5]).

fatihk · Answer 5 · 2013-04-05T04:53:19.897

-1

Using separate variables for matrix elements may prove to be problematic. What are you planning to do when dealing with big matrices like 100x100?

Ofcourse you need to use some array-like structure and I strongly recommend you at least to use arrays

edited Apr 05 '13 at 04:53

answered Apr 04 '13 at 11:28

fatihk

7,789
1
26
48

hmm, it is a bit dangerous (especially for a C++ beginner) : V.push_back(v) is safe, but V[i].push_back( float f) can raise Exception (depending on V[i].capacity). – lucasg Apr 04 '13 at 11:42
He is asking about **performance** too. float[] can be as faster as a _naive implementation with multiple variables_ but about std::vector you can be sure it's _much more slow_. – Adriano Repetti Apr 04 '13 at 11:47
for matrices instead of using push_back's, size initialization can be done in the constructor – fatihk Apr 04 '13 at 11:48
@fatik_h : yep, as long as the OP just read the matrix and does not dynamically modify it. – lucasg Apr 04 '13 at 11:52
Biggest matrix i have found in the literature is 4x4. – Thubie de Jong Apr 04 '13 at 12:08
@ Thubie de Jong very nice – fatihk Apr 04 '13 at 12:19

What is the better Matrix4x4 class design c++ newbie

5 Answers5