2

I pre-allocate a large vector of N variables as a complex<double> that function A acts on. Once it is done, I wish to pass it on to another function B, where I wish to re-use that memory allocation, but as a vector of length 2N of data type double, i.e.:

vector<complex<double>> vec(N);
for (int i = 0; i < M; i++;
    {
       Function A(&vec); // Does work on vec as a vector<complex<double>>

       // vector<complex<double>> vec(N) converts to vector<double> vec(2N) (or vector<float> vec(4N))

       Function B(&vec); // Does work on vec as a vector<double> or vector<float>  
     }

where M and N are large enough to warrant pre-allocation of vec. I am trying to avoid allocating as it is costly. The data is independent of each other (i.e. I will end up populating the data again before I use it in Function B). From what I have read so far, (anonymous?) unions seem the way to go, but I am unsure as to how unions work when it comes to vectors. I am hoping I could get some assistance in this matter.

Thank you.

Edit: It takes 3 seconds to allocate the vector, and the function takes 5 seconds to operate on.

HamzaAB
  • 67
  • 6
  • 1
    Unions won't help you here, they are for statically allocated data, not dynamically. – Evg Sep 04 '18 at 13:29
  • 1
    Is it really a bottleneck of your code? How do you measure your performance? In this example, it's likely that no actual memory reallocation will take place (provided that first vector is destructed), because your program won't have time to return memory to the system. – Yksisarvinen Sep 04 '18 at 13:29
  • Does `Fucntion B` need to have the signature of `ret some_name(std::vector)` or could it have the signature of `ret some_name(std::vector>)`? – NathanOliver Sep 04 '18 at 13:29
  • Since you say the data in both containers is independent, what prevents you from declaring them separately? Each one in its own scope... – TheOperator Sep 04 '18 at 13:30
  • 2
    For me this smells of a premature optimization. First of all you should write your code to be simple, readable, understandable and maintainable. If you have some requirements about performance you need to measure it. If it's not good enough (and "good enough" often *is* good enough) then benchmark and profile to find the worst bottlenecks and fix those first of all, with plenty of documentation and comments (since optimizations usually make code hard to read and understand). – Some programmer dude Sep 04 '18 at 13:31
  • 2
    @Yksisarvinen I do scientific computing with data that fills 64GB of my PC's RAM. I am willing to take any reduction of memory usage or increase in performance. In this case, a lot of work takes place between function A and B that I have omitted for simplicity. – HamzaAB Sep 04 '18 at 13:33
  • @Someprogrammerdude It takes around 3sec for the memory allocation and then 5sec for the function itself, which is the time for 1 loop. I do not see how this is a pre-mature optimization. I have already identified this as a bottleneck by benchmarks. – HamzaAB Sep 04 '18 at 13:37
  • 2
    Actually, you request the hack based on undocumented `complex` class layout. Assuming that `complex` is really two `double` members, and functions A and B only read the vectors (or at least don't change their size), you may declare them as `complex data*, size_t size` and `double* data, size_t size` and pass raw data pointers to these functions. – Alex F Sep 04 '18 at 13:37
  • 1
    How about just allocating a large chunk of bytes, and then depending on where you need to use the memory use it as an array of two `double` (i.e. an array of *almost* `complex`) or as an array of plain `double`? Sure it might break strict aliasing, but if the memory is suitable aligned it should not be a problem practically. – Some programmer dude Sep 04 '18 at 13:39
  • @AlexF That is exactly what I thought of, but then I thought I'd check with the community of a better way. My backup plan is to use the complex and imaginary parts as two adjacent vector elements. Am I correct in understanding you? – HamzaAB Sep 04 '18 at 13:40
  • @HamzaAB Also, ***all*** this information you have provided in comments you should put in the question body itself! It's crucial information relating to your problem and as such need to be in the question. – Some programmer dude Sep 04 '18 at 13:40
  • @TheOperator If I understand you correctly, I would need to allocate and then deallocate at the end of every function call. – HamzaAB Sep 04 '18 at 13:41
  • @Someprogrammerdude I have edited the question as you have suggested, thank you. I have thought of using the real and imaginary part as consecutive elements of a vector. Will it affect my performance in any way, in the sense that it would impede efficient vectorization? – HamzaAB Sep 04 '18 at 13:46
  • @AlexF will this impede compiler vectorization or cause memory cache misses to degrade performance? – HamzaAB Sep 04 '18 at 13:48
  • 1
    You can actually cast one to another https://wandbox.org/permlink/KAc1U9AqgZXnieNw (of course, with some precautions, because it is a hack). – Nikita Kniazev Sep 04 '18 at 13:49
  • How can allocation of a single vector take 3 seconds? – Nikita Kniazev Sep 04 '18 at 13:52
  • @NikitaKniazev The dataset is large, on my home PC I am limited to 64GB but my work PC has 128GB, and I also use HPC clusters. – HamzaAB Sep 04 '18 at 13:57
  • @NikitaKniazev with regards to your suggestion, will I be able to pass this through to BLAS/LAPACK routines safely? – HamzaAB Sep 04 '18 at 14:27
  • No guarantee. I think you do not provide some information, please post a [MVCE](https://stackoverflow.com/help/mcve). Global allocators usually implemented to reuse as much memory as possible to reduce OS API calls. You can see that it is actually reusing the same memory in this example https://wandbox.org/permlink/ugYvnXUG3dOYcMtm – Nikita Kniazev Sep 04 '18 at 18:09

2 Answers2

4

You're trying to reuse the raw memory, not the vector - so why not do just that and avoid messing around with types?

Write a pool allocator which pre-allocates and owns the memory.

Construct the first vector<complex> using (a proxy for) that pool allocator. Later after the first vector is destroyed and has relinquished its claim on the pool, create a vector<double> using (a proxy for) the same pool allocator.

The vector will copy the top-level allocator object, which is why it must be a proxy keeping a reference to your pre-allocated persistent pool.

Useless
  • 64,155
  • 6
  • 88
  • 132
  • What you have mentioned is something I have not yet come across in my experience with C++. I will look into pool allocators and get back to you. Can you provide a link for me to start reading into this? Thank you for your assistance. – HamzaAB Sep 04 '18 at 13:53
  • 1
    [Boost has one](https://www.boost.org/doc/libs/1_68_0/libs/pool/doc/html/boost_pool/pool.html) although I'm not sure it's suitable for your use case. This page on [`std::allocator`](https://en.cppreference.com/w/cpp/memory/allocator) describes the default allocator, links to everything relevant including `allocator_traits`. Pool allocator is anyway a standard term. – Useless Sep 04 '18 at 13:58
  • Since then, I have been looking into memory pools and I have made some progress but it still seems non-trivial to write one. I will continue to read up on it, hopefully I can find a simpler example than what I found through Boost and Google (and this site). As a back up plan, I was considering using my own struct of two doubles, and then overloading the operator () to give me the double indexing and complex indexing otherwise. My only concern is if this will play nicely with BLAS/LAPACK when using both. – HamzaAB Sep 04 '18 at 16:17
  • Scroll down to the bottom of [the page](https://www.boost.org/doc/libs/1_68_0/libs/pool/doc/html/boost_pool/pool/interfaces.html) for the example. – Nikita Kniazev Sep 04 '18 at 18:15
1

I see the following solutions:

1. Variants - modern day unions

Use a vector< variant< double, complex<double> > >. This can store either double or complex<double> elements.

Even though the variant types (double and complex<double>) may share space, as only one type is active at a given time, you still don't know how efficiently they do so (there also needs to be an identifier to mark which type is active, and alignment to 8-byte-boundaries may make you waste 50% space). See also here.

Also, you have to differentiate for each element, even though the container will have all elements of same type at any given moment.

2. Use raw memory

A more traditional option is to allocate memory with operator new() (the function, not the operator). You would then construct doubles and complexes there, using Placement New, and destroy them using explicit destructor calls.

If you don't need many std::vector features, like dynamic growing, removing, inserting, but rather just one big data store, then this can be a valid solution. In this case, the only thing needed is the size.

To not worry about scope, you can pack everything into a std::unique_ptr with custom deleter. The deleter would just call operator delete().

3. Write custom allocator

std::vector takes a 2nd template argument for the allocator. Allocators allow you to customize where memory is taken from. But there's quite a lot to implement, and the standard imposes several rules on the behavior.

TheOperator
  • 5,936
  • 29
  • 42
  • Thank you for your response. Method 1 seems unfavourable due to the waste in space, and I would think that when it comes to vectorization, it would also be 50% efficient as the compiler will consider the padded area as a kind of memory stride. Method 2 and 3 seem favourable. With regards to 3 though, customizing the template may make it incompatible with the MKL FFT libraries I use, if not done according to their spec. I shall look into 2 and get back to you. – HamzaAB Sep 04 '18 at 14:19
  • I have looked into Option 2 and I think should likely works for me. To confirm I understand correctly, would this be correct: `char *buf = new char[sizeof(N*complex)]; complex *p = new (buf) complex[N]; // use it; delete[p]; double *p = new (buf) double[2N]; // use it; ::operator delete(buf);` – HamzaAB Sep 04 '18 at 15:28
  • 1
    No, new/delete must be symmetric. It should be: `void* mem = operator new(N * sizeof(complex));` then you can convert that to a complex pointer: `complex* c = static_cast(mem)`. So far, you allocated memory and have a pointer to complexes, but the memory is a debris field. You need to construct your elements: with incrementing address, call placement new `new (address) complex(...)`. After usage, you need to destroy each element as `address->~complex()`. Repeat for double (skip operator new). Finally, when you don't need the memory anymore, deallocate it with `operator delete(mem)`. – TheOperator Sep 05 '18 at 08:04