Discussion - Arrays vs Vector of Vectors when creating matrices - what is the most practical option?

Question

So i have stumbled upon an exercise on an C++ book called - "Guide To Scientific Computing in C++"

Here is the exercise: "Write code that dynamically allocates memory for three 2 × 2 matrices of double precision ﬂoating point numbers, A, B, C, and assigns values to the entries of A and B. Let C = A + B. Extend your code so that it calculates the entries of C, and then prints the entries of C to screen. Finally, de-allocate memory. Again, check you have de-allocated memory correctly by using a for loop as in the previous exercise."

This caught my eye, i tried to do this problem using both 2D Arrays (i was able to do it very easily) and with vector of vectors (i failed).

I did a lot of research and read some post on StackOverFlow and basically the opinion was universal - when dealing with matrices the option is always 2D arrays.

But knowing that there are a lot of programmers here (and that i am a newbie with C++), i would really like to read more opinions about this topic!

P.S: Here is a little snip of my failed attempt at trying to create a matrix using vector of vectors:

for (int row{ 0 }; row < 2; row++) { // Create Matrix A - goes through the matrix rows
    
    for (int col{ 0 }; col < 2; col++) { // goes through the matrix columns
        temp.push_back(rand() % 201); // add random number to the temporary vector
    }

    matrixA.push_back(temp);
}

// Outputing 
for (int row{ 0 }; row < matrixA.size(); row++) { // goes through the matrix rows

    for (int col{ 0 }; col < matrixA.at(row).size(); col++) { // goes through the vectors inside matrixA
        cout << matrixA.at(row).at(col) << "\t";
    }
    cout << endl;
}

This is the output: enter image description here

What do you mean by "failed attempt"? What goes wrong? Also, you don't appear to be `clear`ing `temp` for each row. — cigien, Jul 10 '20 at 15:30
`when dealing with matrices the option is always 2D arrays.` no no no. The option is always to use a continuous memory layout and achieve 2d access by proper striding. using indirection is slow due to 1) indirection obviously (one more memory access) 2) cache unlocality. Also it becomes tremendously more difficult to implement things like "only access every 3rd column from row 15 to 60" (especially in an efficient manner) — Sebastian Hoffmann, Jul 10 '20 at 15:33
I'd only use a 1D vector and map the row x col onto the 1D vector. Assuming a rectangularly fixed row x col sized matrix. A sparse matrix may warrant a different approach. A ragged matrix does warrant a different approach. — Eljay, Jul 10 '20 at 15:33
Discussions are offtopic, and what is "most practical" is purely opinion based. — 463035818_is_not_an_ai, Jul 10 '20 at 15:34
@cigien actually no, i dind´t use clear... Every time i create a vector and place him in my matrixA, i should clear my temp vector, correct?? — Luis Amaro, Jul 10 '20 at 15:39
@LuisAmaro -- [See this answer](https://stackoverflow.com/questions/21943621/how-to-create-a-contiguous-2d-array-in-c/21944048#21944048). It keeps the 2D array access syntax, but it is dynamically created and it stores data in contiguous memory, just like a 1D array. — PaulMcKenzie, Jul 10 '20 at 15:40

Andreas H. · Answer 1 · 2020-07-10T18:18:17.563

The root of the problem is that C++ brings no built-in support for multidimensional arrays, and

A matrix is not an array of arrays

So even though one can have in C/C++ an array of an array as e.g. in double M[3][3] this is not a good way to represent a matrix. Even if in the compile time fixed case this may still be ok, the vector< vector<double> > way is definitely not good.

Why:

The subscript syntax M[i][j] is still reflecting the fact that this is an array of arrays and not a true multidimensional array (M[i,j] is legal in C/C++ but means something completely different). The M[i][j] syntax also does not really match the mathematics usage. To access the i,j element of a matrix mathematicians usually write

A_i,j

and not

(A_i)_j

(which would correspond to M[i][j])
Experience in mathematics and scientific computing shows that most of the time both dimensions are usually equally important. There is not really an outer and inner dimension. Slicing is often done by both row and column. More complex subscript patterns (such as submatrix or striding) are also quite common.
Frequently one needs a flattening operation, i.e. to reorganize the matrix elements in a linear vector.
0 x N or M x 0 matrices are not realizable. This applies to the static case double A[N][M] but also to the vector< vector<double> > where at 0 x N cannot be represented. Proper handling the empty matrices is extremely important. A matrix subscript may generate an empty matrix as a legitimate case. Matrix multiplication on the other hand will always need matching inner dimensions. So 0xN * NxM is a valid case and should trigger no error.
The vector< vector<double> > wastes memory. Also it is slower than it needs to be (think of a Nx2 matrix with N large. this has a lot of overhead). Same would apply to double **M in pure C.
vector< vector<double> > is not compatible with external linear algebra libraries (such as LAPACK or else)
vector< vector<double> > could be non-rectangular, i.e. one needs to make sure that all inner vectors have the same length.

A matrix is an entity of its own

This is how it is done frequently in linear algrbra packages. There are differences but essetially are all very similar.

It is most reasonable to treat a multidimensional array is an entity of its own. It is not an array of an arrays or so, even though it might seem to be a good idea at first sight.
Storage of data is always linear
The access of an matrix element is mapped into this linear storage. Actually it is quite simple. To access the i, j element, use A[i*lda + j] with lda the length of the leading dimension. Not that this implies a certain ordering of the element in memory. This case is commonly refered to as C order. The opposite, FORTRAN order, would just swap the meaning of i and j, or in case of higher dimensionality, reverse the order of the indices.
To be really generic, a nd-array consists of
```
struct ndarray {
  double *buffer;
  size_t  size[DIMS];
  size_t strides[DIMS]
};
```
This could implement a owning (i.e. has ownership over the buffer) or a non-owning matrix. The latter may be important to implement zero-overhead submatrix or column/row subscripting.

Access to the i,j element of a matrix (DIM == 2) is then
```
buffer[ i*strides[0] + j*strides[1] ]
```
This readily generalizes to the higher dimensions.

Also transposing of the matrix just requires reversal of the strides array. No copying is needed.

A row or column subscript, a submatrix, or getting a strided submatrix could be implemented as a zero-overhead operation. One just needs to fill in the strides array properly and set buffer and size accordingly.

strides has to be initialized propery on matrix creation, depending on whether C or FORTRAN layout is to be used.

imho you miss the most important drawback of a vector of vectors. A vector has its element in contiguous memory and that where most its power comes from. Thats not the case for a vector of vectors — 463035818_is_not_an_ai, Jul 10 '20 at 22:34

Discussion - Arrays vs Vector of Vectors when creating matrices - what is the most practical option?

1 Answers1

A matrix is not an array of arrays

A matrix is an entity of its own

Linked