2

In some circumstances, my C++14 program needs a "block" of about 100 millions complex<float>, which requires close to 1 GB of RAM. We can safely assume that the required memory will be available.

However allocating a new std::vector is really slow, because the complex constructor is called 100 millions time. On my machine, the code requires around a full second to initialise the array.

By comparison, calling calloc(), which initialises the allocated memory to zero, with mostly the same effect, will run in a very small number of milliseconds.

It turns out we don't even need this initialisation, because the complex in the large array will be populated shortly later from an external source. I am looking therefore at deferring the construction of the objects in that "block" to a later time, and then construct them directly from the external data source.

So my question is, is there a safe idiomatic and efficient C++ way to do that, perhaps using C++ move semantics along the way? If not, and we decide to simply malloc the memory block, can we then simply reinterpret_cast the memory block to a plain old C array of complex<float>?

Thank you for helping

Jean-Denis Muys

Community
  • 1
  • 1
Jean-Denis Muys
  • 6,772
  • 7
  • 45
  • 71
  • 7
    Use `std::vector<>`, but use the `reserve()` member function, not `resize()` or the sized constructor. – ildjarn Oct 12 '16 at 23:12
  • Is it really so slow even when you compile it with optimizations? I can't believe it. – Al Kepp Oct 13 '16 at 00:18
  • Why do you want to use vector in this case? You can't attach/detach memory to vector, but you can use all std algorithms using pointers instead of iterators. – 0kcats Oct 13 '16 at 03:32
  • I don't *want* to use `vector`. I *tried* `vector` as well as raw `malloc`, and got a huge performance gap that actually matters in my case. I am basically looking for the sweet point between performance, clarity, safety, and similar quality criteria. – Jean-Denis Muys Oct 13 '16 at 10:49
  • 1
    As a side note, `calloc` can often [cheat and not really allocate memory, or wait before really handing out the memory to you](http://stackoverflow.com/a/2688522/3460805), the cost come later. Don't rely on its apparently good performance at calling point. – Chnossos Oct 13 '16 at 10:57
  • true. And `malloc()` too, BTW, returning a pointer (not failing), but not really allocating the memory. The memory allocation is then handled by the kernel in a page fault when attempting using the memory block, and potentially only failing *then*. – Jean-Denis Muys Oct 13 '16 at 10:59
  • @Al Kepp yes the slow down is over a thousand fold. The reason is probably the implementation of `calloc` as described at the page linked to by @Chnossos – Jean-Denis Muys Oct 13 '16 at 11:10
  • I'd hope that a compiler would be able to optimize `vector(N)` to a `calloc` call... if that doesn't seem to be happening then it would be a good thing to submit as a bug or suggested improvement – M.M Oct 14 '16 at 05:00
  • My compiler (clang) doesn't do that optimisation. Does your? – Jean-Denis Muys Oct 14 '16 at 05:16

3 Answers3

3

If you define the default constructor for the complex<float> class as empty, which leaves the member variables uninitialized, then there shouldn't be any real difference between the two operations given that compiler optimizations are turned on.

Assuming the below definition for the complex class.

template <typename T>
struct complex
{
  complex() {}; // Empty constructor does nothing
  T a, b;
};

The generated assembly for using vector initialization with x86-64 gcc 6.2 and -O2 enabled is:

std::vector<complex<float>> v(100);

    mov     edi, 800
    call    operator new(unsigned long)
    mov     rdi, rax
    call    operator delete(void*)

And the generated assembly for manually calling malloc and free is:

auto v = malloc(100 * sizeof(complex<float>));
free(v);

    mov     edi, 800
    call    malloc
    mov     QWORD PTR [rsp+8], rax
    mov     rdi, QWORD PTR [rsp+8]
    call    free

As you can see, the vector implementation no longer calls the constructor of complex<float> for each element. The usage of vector is more correct and readable, and also takes advantage of RAII which helps to prevent memory leaks.

Adam Yaxley
  • 670
  • 1
  • 7
  • 13
  • 1
    this is a very good point. Of course, this solution requires defining a custom complex class, rather than using `std::complex` – Jean-Denis Muys Oct 14 '16 at 04:56
  • in my environment (clang on the mac), the complex default constructor is defined this: `complex(const value_type& __re = value_type(), const value_type& __im = value_type()) : __re_(__re), __im_(__im) {}` – Jean-Denis Muys Oct 14 '16 at 05:15
  • I see, so it looks like `std::complex` always initializes its real and imaginary values, which would always give you a slowdown if you initialize them all at once in an `std::vector`. – Adam Yaxley Oct 14 '16 at 05:45
  • Yes, and currently, no level of Clang optimisation will optimise this out – Jean-Denis Muys Oct 14 '16 at 05:50
  • Actually, interestingly testing with x86-64 clang 3.9.0 yields optimized assembly code for `std::vector > v(100);`. However it seems that x86-64 gcc 6.2 will instead call the constructor for each element for std::complex. [Try it online here](https://godbolt.org/g/xEPWTk) – Adam Yaxley Oct 14 '16 at 06:36
  • my compiler (Apple LLVM version 8.0.0 (clang-800.0.42)) behaves as GCC: it calls the complex constructor in a loop. So does Clang 3.8.1 or earlier at the page you link. Unfortunately, Apple has stopped reporting on which LLVM version its compiler is based. This experiment seems to demonstrate it's based on a version earlier than 3.9 though. Thanks for the link – Jean-Denis Muys Oct 14 '16 at 07:20
2

I strongly suggest to stick to c++ and avoid manually managing the memory yourself.

The standard library should be enough. E.g.

std::vector< complex > my_vector;

// Reserve the necessary space without constructing anything
my_vector.reserve( 100'000'000); 

// construct the elements when needed
populate( my_vector ); 
Trevir
  • 1,253
  • 9
  • 16
0

So your question is if you can use malloc(). I used the same approach many years ago with my old C++ compiler and it worked. But at the end I had to call free() instead of delete[]. I think this is implementation-specific, so you should try it on your compiler.

Al Kepp
  • 5,831
  • 2
  • 28
  • 48