0

As far as I understand unique_ptr<T> is not supposed to have such a huge overhead.

What do I wrong?

size_t t = sizeof(DataHelper::SEQ_DATA); // t = 12

std::vector<std::vector<std::unique_ptr<DataHelper::SEQ_DATA>>> d(SEQ_00_SIZE + 1); // SEQ_00_SIZE = 4540
for (unsigned int i = 0; i < d.size(); ++i) {
    for (unsigned int k = 0; k < 124668; ++k) {
        std::unique_ptr<DataHelper::SEQ_DATA> sd = std::make_unique<DataHelper::SEQ_DATA>();
        d[i].push_back(std::move(sd));
    }
}

takes about ~21GB of ram.

std::vector<std::vector<DataHelper::SEQ_DATA>> d(SEQ_00_SIZE + 1);
for (unsigned int i = 0; i < d.size(); ++i) {
    for (unsigned int k = 0; k < 124668; ++k) {
        DataHelper::SEQ_DATA sd;
        d[i].push_back(sd);
    }
}

takes about ~6,5GB of ram.

Additional information:

struct SEQ_DATA {
    uint16_t id = 0;
    uint16_t label = 0;
    float intensity = 0.0f;
    float z = 0.0f;
};

I just want to have a single vector<vector<T>> which holds my 4540 * 124668 objects as efficient as possible. I read values from binary files. Since the number of elements within the binary files varies, I cannot initialize the inner vector with the correct number (i.e. 124668 is only true for the first file).

gcc 9.3.0, c++ 17

JohnFilleau
  • 4,045
  • 1
  • 15
  • 22
Simon Pio.
  • 115
  • 1
  • 14
  • How did you measure this overhead? – JohnFilleau Mar 03 '22 at 19:35
  • This doesn't answer your direct question, but maybe answers the question you should have asked: I wouldn't load all of your data into memory at once. – JohnFilleau Mar 03 '22 at 19:36
  • @JohnFilleau I checked my resources in Ubuntu monitor. – Simon Pio. Mar 03 '22 at 19:36
  • The overhead of a unique pointer is compraed to a *raw* pointer. And not, as you measure, no pointer at all. On a 64bit machine a pointer is likely 8 bytes long. Add that to every object. How much overhead does that come up at? – StoryTeller - Unslander Monica Mar 03 '22 at 19:38
  • The ubuntu version may be relevant. Not to me, since I don't know ubuntu. But to someone. – JohnFilleau Mar 03 '22 at 19:38
  • Each `unique_ptr` is an additional 8 byte. Then each allocation will require some additional memory for internal bookkeeping or other overhead. For example it could be that the allocator hands out only chunks of size 16, because that is more efficient to implement or because it needs to fulfill alignment requirements. Doing many small allocations is not a good idea. The structure is quite small, why do you want to allocate it on the heap? – user17732522 Mar 03 '22 at 19:41
  • the big different here is the overhead of allocating each struct on the heap. YOu 'raw' case is storing the raw structs in the inner vector. THis is not a measurement of the overhead of unique_ptr, it measures the cost of using lots of tiny bits of heap – pm100 Mar 03 '22 at 19:41
  • @user17732522 I wanted to avoid copies and make it as efficient as possible since I need it to train a SVM. 80Gb+ of binary files. But I understand the difference now. – Simon Pio. Mar 03 '22 at 19:45
  • Computers can be funny. You'll find that sometimes, a lot of the time in what I'm usually writing, the cost of copying is cheaper than the cost of the extra cache misses you get when iterating through non-contiguous data. – user4581301 Mar 03 '22 at 19:47
  • For 64 bit you're probably loosing 4 byte place every object aligned with 8 bytes and additionaly you loose another 8 byte to storing the pointer. That doubles memory consumption ignoring unused vector memory, but probably memory management for the large amount of small objects is responsible for the rest... – fabian Mar 03 '22 at 19:48

1 Answers1

2

"std::unique_ptr doesn't have huge overhead" means that it doesn't have huge overhead compared to a bare pointer to dynamic allocation:

{
    auto ptr = std::make_unique<T>();
}
// has comparable cost to, and has exception safety unlike:
{
    T* ptr = new T();
    delete ptr;
}

std::unique_ptr doesn't make the cost of dynamic allocation cheaper.


I just want to have a single vector<vector<T>> which holds my 4540 * 124668 objects as efficient as possible.

The most efficient way to store 4540 * 124668 objects is a flat array:

std::vector<DataHelper::SEQ_DATA> d(4540 * 124668);

However, the benefit of this isn't necessarily significant given that the inner vectors aren't very small.

(i.e. 124668 is only true for the first file).

If you don't need all 124668 elements, then it may be a waste of memory to have the unused elements in the vector.

eerorika
  • 232,697
  • 12
  • 197
  • 326