Alternatives for polymorphic data storage

Question

I'm storing a large amount of computed data and I'm currently using a polymorphic type to reduce the amount of storage required. Everything is extremely fast except for deleting the objects when I'm finished and I think there must be a better alternative. The code computes the state at each step and depending on the conditions present it needs to store certain values. The worst case is storing the full object state and the best state is storing almost nothing. The (very simplified) setup is as follows:

class BaseClass
{
public:
    virtual ~BaseClass() { }

double time;
    unsigned int section;
};

class VirtualSmall : public BaseClass
{
public:
    double values[2];
    int othervalue;
};

class VirtualBig : public BaseClass
{
public:
    double values[16];
    int othervalues[5];
};

...

std::vector<BaseClass*> results(10000);

The appropriate object type is generated during computation and a pointer to it is stored in the vector. The overhead from vtable+pointer is overall much smaller than than the size difference between the largest and smallest object (which is least 200 bytes according to sizeof). Since often the smallest object can be used instead of the largest and there are potentially many tens of millions of them stored it can save a few gigabytes of memory usage. The results can then be searched extremely fast as the base class contains the information necessary to find the correct item which can then be dynamic_cast back to it's real type. It works very well for the most part.

The only issue is with delete. It takes a few seconds to free all of the memory when there is many tens of millions of objects. The delete code iterates through each object and delete results[i] which calls the virtual destructor. While it's not impossible to work around I think there must be a more elegant solution.

It could definitely be done by allocating largish contiguous blocks of memory (with malloc or similar), which are kept track of and then something generates a correct pointers to the next batch of free memory inside of the block. That pointer is then stored in the vector. To free the memory the smaller number of large blocks need to have free() called on them. There is no more vtable (and it can be replaced by a smaller type field to ensure the correct cast) which saves space as well. It is very much a C style solution though and not particularly pretty.

Is there a C++ style solution to this type of problem I'm overlooking?

What kind of data, exactly? Is it all trivially destructable pod? — Yakk - Adam Nevraumont, Jul 19 '13 at 00:00
Yes, it's just plain data. The only function is the virtual destructor. — Kyle, Jul 19 '13 at 00:03
Do you need `dynamic_cast`, or do you already know the type? — Yakk - Adam Nevraumont, Jul 19 '13 at 00:34
I use dynamic_cast in an if else if block when actually parsing the results but it could easily by replaced by a field to hold the type which I could cast to. The time it takes for a few dynamic casts is trivial compared to the rest of the operations performed. — Kyle, Jul 19 '13 at 00:49
http://stackoverflow.com/questions/18856824/ad-hoc-polymorphism-and-heterogeneous-containers-with-value-semantics — v.oddou, Apr 06 '15 at 02:31

score 1 · Answer 1 · answered Jul 18 '13 at 23:45

1

You can overload the "new" operator (i.e. void* VirtualSmall::operator new(size_t) ) for you classes, and implement them to obtain memory from custom allocators. I would use one block allocator for each derived class, so that each block size is a multiple of the class' it's supposed to store.

When it's time to cleanup, tell each allocators to release all blocks. No destructors will be called, so make sure you don't need them.

answered Jul 18 '13 at 23:45

DanielKO

4,422
19
29

Definitely more C++ in style although it still presents the issue of memory alignment and keeping track of allocated chunks of memory somewhere. Thanks I will read a bit more into it. – Kyle Jul 19 '13 at 00:07
Each block is like a pair, the integer keeps track of how many elemnts are in use. The allocator itself can be implemented trivially with a vector of such pairs, plus one or two small convenience functions. You can define SomeLargeStorage as an enum containing an array of the specific type, this way the compiler will make sure it's aligned properly. – DanielKO Jul 19 '13 at 00:18
Also, I'm not suggesting you to implement such a thing, just saying how the (probably) optimal solution will look like. Check out HeapLayers/Hoard (http://www.hoard.org) it might have a block allocator ready for you to use. – DanielKO Jul 19 '13 at 00:24
You could shy away from allocators and `operator new`, and mimic block allocationin the `vector` wrapper. – Yakk - Adam Nevraumont Jul 19 '13 at 00:41

Alternatives for polymorphic data storage

1 Answers1