Work on C++ arrays like in Matlab (for faster computation)

Question

I come from a Matlab world where working on arrays in extremely fast and I want to do the same with a C++ function. Basically I have an array of values and I want to average each row of this array (while omitting the zeros). In Matlab that would look something like:

a = ones(rowSize, ColSize)
%* fill values *%
mean(a)

Which would give me a vector that has as many columns as a and would contains the average value of each column in a matter of milliseconds regardless of the matrix size.

To remove the zeros I would create a vector to tell me where they are and apply the mean function only when appropriate

To do the same in C++, my first instinct would be a loop that would roughly look like this:

int rowSize; // random value
int colSize; // random value
float num;
float denum;
vector<vector<float>> vec(rowSize, vector<float>(colSize));
vector<float> mean;

// * fill values *//

for (int i = 0; i < vec.size(); i++) {
    num = 0;
    denum = 0;
    for (int j = 0; j < vec[i].size(); j++)
    {
        if (vec[i][j] != 0)
        {
            num += vec[i][j];
            denum++;
        }
    }
    mean[i] = num / denum;
}

But that would take forever with big arrays. So considering each loop iteration is independent from the other one is there a way to do that in a faster way? And if possible in a transparent way like in Matlab... I am struggling to find the right resources for that.

Thank you in advance.

don't use a vector of vectors, use a single 1D vector. Then use a simple loop and let the compiler do the optimisations, test it first then do further manual optimisations if necessary — Alan Birtles, Aug 09 '22 at 15:32
why do you think it would take forever? Other than what Alan suggested (nested vectors are rather poor compared to non-nested vectors) the code looks ok. — 463035818_is_not_an_ai, Aug 09 '22 at 15:38
IIRC, Matlab stores mult-dimensional arrays end to end, and (assuming constant dimensions) in column-major order (unlike C or C++ which, for a raw array, is row-major order). Beyond that .... Do the `vec[i]`s have the same size, or different sizes? Do you need to allow for possibility of overflow in the summation? Have you considered using iterators (or, in C++11, range-based loops) instead of indexing? — Peter, Aug 09 '22 at 15:41
[Quick example of what Alan's talking about](https://stackoverflow.com/a/36123944/4581301). Here is a [slightly more complicated, but easier to make safe approach](https://stackoverflow.com/a/2076668/4581301). — user4581301, Aug 09 '22 at 15:46
Why do you think it will take forever? Matlab is likely to be written in C or C++, so the underlying code used by Matlab probably contains low level loops... what C or C++ use natively. — Serge Ballesta, Aug 09 '22 at 15:51
Using what language do you think MATLAB is implemented? Are you gonna rewrite MATLAB? And you're doing it from the ground up with no previous knowledge of C++? You're at a wrong turn, my friend. — Enlico, Aug 09 '22 at 16:22
Matlab uses LAPACK, which is highly optimized. I'd recommend taking a look at this question https://stackoverflow.com/questions/6058139/why-is-matlab-so-fast-in-matrix-multiplication — , Aug 09 '22 at 16:24
Matlab's `mean` does not ignore 0 values. `mean([0 2])` is `1`. Do you really want that behavior? And as others said, loops in C++ are not slow (when compiled with optimization). If you like to avoid writing manual loops, consider using a library such as Eigen: https://eigen.tuxfamily.org/ — chtz, Aug 09 '22 at 16:38
@463035818_is_not_a_number because I am working on at least 20 arrays of 4e7 elements at the same time and it does take a while to go through them all. In Matlab it's a matter of seconds because Matlab is designed to be highly effective with vectorisation. In my C++ it's a matter of minutes. — Jack, Aug 10 '22 at 07:23
@Jack did you turn on optimizations? Compilers don't optimize code if you don't ask for it, and a non optimized can be extremely slow — 463035818_is_not_an_ai, Aug 10 '22 at 07:25
@Peter Arrays all have the same number of rows. But the number of columns should be able to change. I don't need overflow (as far as I know). I have not considered iterators but I will look into it ! — Jack, Aug 10 '22 at 07:29
@TedLyngmo I did simplify the code here for the purpose of the question (because there's more filtering of the data than just checking `if !=0`) but yes I did take in consideration the division by 0. Let me add it in the question... — Jack, Aug 10 '22 at 07:33
@chtz I simplified the Matlab code here considering it's not per se a Matlab question. But the Matlab code actually looks something like `[ii,~,v] = find(a); Average = accumarray(ii,v,[],@mean);` which would filter the `0`s. I did not look into Eigen yet but I will right away! — Jack, Aug 10 '22 at 07:36
@SergeBallesta Matlab has been optimized for vectorisation. I don't know how it works exactly but it is incredibly fast even with big arrays. I am working on at least 20 arrays of 4e7 elements at the same time and it does take a while to go through them all. In Matlab it's a matter of seconds and in C++ it's a matter of minutes.. — Jack, Aug 10 '22 at 07:56
@463035818_is_not_a_number oh indeed it wasn't! It was activated in Release but not Debug! — Jack, Aug 10 '22 at 07:59
Re: "In Matlab it's a matter of seconds and in C++ it's a matter of minutes." -- This sounds like you did not enable optimizations. On clang/gcc compile at least with `-O2` or better `-O3 -march=native`. You can also try `-Ofast` (but this enables some "unsafe" optimizations). — chtz, Aug 10 '22 at 08:15

Ted Lyngmo · Answer 1 · 2022-08-09T16:21:06.137

So considering each loop iteration is independent from the other one is there a way to do that in a faster way?

Yes, you could use one of the execution policies. Here's an example using the C++20 parallel_unsequenced_policy.

#include <algorithm>
#include <execution>

// ...

std::vector<float> mean(vec.size()); // pre-allocate

std::transform(std::execution::par_unseq, vec.begin(), vec.end(), mean.begin(),
[](auto& subvec) {
    // This part will be executed in parallel with approx. one thread (usually)
    // per hardware thread that your system supports.
    int num = 0;
    int denum = 0;
    for(int val : subvec) {
        if(val) {
            num += val;
            ++denum;
        }
    }
    // I'm using `INT_MIN` instead of doing a division by zero:
    return denum ? num/denum : INT_MIN;
});

Note: Inside the std::transform you should not write to any element that will be read or written to at the same time as the loop runs since it'll cause a race condition.

Work on C++ arrays like in Matlab (for faster computation)

1 Answers1