0

Is there a way to speed up the creating of MatrixXd full of zeroes ? Indeed, consider this code

#include <vector>
#include <iostream>
#include <chrono>
#include <ctime>
#include <Eigen/Core>

int main() {
    std::vector<void*> results;
    results.reserve(2000);
    int n = 200;
    auto t0 = std::chrono::system_clock::now();
    for(int i = 0; i < 1000; i++) {
        Eigen::MatrixXd* A = new Eigen::MatrixXd(n,n);
        A->setZero();
        results.push_back(A);        
    }
    auto t1 = std::chrono::system_clock::now();
    for(int i = 0; i < 1000; i++) {
        double* A = (double*)calloc(n * n, sizeof(double));
        results.push_back(A);
    }
    auto t2 = std::chrono::system_clock::now();
    std::chrono::duration<double> t01 = t1-t0;
    std::chrono::duration<double> t12 = t2-t1;
    printf("Eigen+setZero %e calloc %e size %lu\n", t01.count(), t12.count(), results.size());
}

resulting on (on my laptop)

./a.out
Eigen+setZero 5.440070e-01 calloc 1.527000e-03 size 2000

The matrices are built using Eigen::MatrixXd(n,n) followed by a setZero.

Looking at the timings in more details, the setZero is taking almost all the time (the new being super fast). Any way to speedup this ?

Ideally I would like to get down to what's possible with using just calloc. A memset doesn't really help, it's as slow as setZero. Finally, I would like to have MatrixXd, so the Map route isn't ideal.

Shawn
  • 593
  • 4
  • 12
  • Since you're pushing 2000 pointers into `results`, you should call `results.reserve(2000);` before you do anything. As it is you're going to get more resizes of the vector during the first loop (with `setZero`), which will cause it to appear to take longer than it does. – 1201ProgramAlarm May 28 '19 at 19:57
  • Moreover, `new Eigen::MatrixXd(n,n)` performs two allocations, one for the `MatrixXd` itself (24 bytes) and one made by `MatrixXd` itself to allocate `n*n*8` bytes. – ggael May 28 '19 at 20:14
  • @1201ProgramAlarm true, but results is here just so that it all doesn't get optimized away. I don't really use that anywhere in the actual code. I edited with the reserve, but the timings are virtually unchanged. – Shawn May 28 '19 at 21:03
  • Related: https://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc. If you don't like using `Eigen::Map`, you _could_ patch your Eigen library to always use `std::calloc` instead of `std::malloc`. The actual question is what is your use-case of allocating lots of Zero-matrices? – chtz May 28 '19 at 21:59
  • Thanks for the link. I also found that. The use case is a bit complex to explain in 600 characters, but basically I am "merging" a bunch of small matrices into bigger one. Like think A = [A11 A12;A21 A22] but some (say half) of those A_ij are zero. So I usually just zero out the whole A first, before copying the smaller into it. (So the actual use can is not really that, but I found that it's taking more time than I thought to do all the setZeros - hence me trying to improve it) – Shawn May 28 '19 at 22:23
  • 1
    You are not limited to 600 characters, if you edit your question ... How "large" is "large" in your case? Are the `A_ij` of the same (fixed) size? Have you considered using a sparse matrix for `A` (if half the entries are filled, this is likely not worth it anymore)? What are you doing with your large matrix afterwards? – chtz May 29 '19 at 09:22
  • This happens in the context of a hierarchical approximate factorization (some sort of block incomplete LU, with some hierarchical component like in the fast multipole method or in H-matrices). Unfortunately, those merged matrices typically get filled (either because they have to be factored with some dense Cholesky/LU/Eigenvalue factorization, or because they are overwritten by products of other matrices). Sparse wouldn't help unfortunately. – Shawn May 30 '19 at 00:16
  • That being said, I realized my question may be ill-posed. I could certainly be wrong, but calloc may be so fast because, while it does allocate memory (like malloc), it doesn't "touch", unlike setZero (something about memory pages ?). Also, the sizes are typically '2x'. Like A = [A11 A12;A21 A22], with Aij some (not the same, not fixed) size. It's hard to give a precise number. Typical cases are A = 500x500 and Aij about 250x250. – Shawn May 30 '19 at 00:22

0 Answers0