6

Problem

I have a quite complex image processing application where one of the sub-modules need to load huge binary bitmaps into memory. Actually as much as up to 96 GB (meaning 888 888 x 888 888 pixel image). Disks is 2xSSD raid0 with read/write at about 1 GB/s. It is loading the image into a vector (each element represent a line in bitmap) of smart-pointers to vector with bytes (each element represent 8 pixels). The strange problem here is that after repetitive loading and clearing the vectors (I see that the memory is actually filled and cleared without memory leak), it seems to take longer and longer time for each iteration. Specially clearing the memory take very long time.

Tests

I made some simple test application to test this isolated and from different angles. Replacing smart-pointers with raw pointers gave same strange behavior. Then I tried to use native arrays instead of vector, and that did the trick. After 100 iterations of load/clear 24 GB time increased drastically when using vectors, while the array implementation was stable on the time. Below is test application filling memory with 24 GB of rubbish instead of loading an actual image, with same results. Tests done on Windows 10 Pro with 128 GB RAM, and built with Visual Studio 2013 Update 5.

This function uses vectors for load/clear:

void SimpleLoadAndClear_Vector(int width, int height) {
    time_t start_time, end_time;

    // Load memory
    time(&start_time);
    cout << "Loading image into memory...";
    auto width_bytes = width / 8;
    auto image = new vector<vector<unsigned char>*>(height);
    for (auto y = 0; y < height; y++) {
        (*image)[y] = new vector<unsigned char>(width_bytes);
        auto row_ptr = (*image)[y];
        for (auto b = 0; b < width_bytes; b++) {
            (*row_ptr)[b] = 0xFF;
        }
    }
    cout << "DONE: ";
    time(&end_time);
    auto mem_load = (int)difftime(end_time, start_time);
    cout << to_string(mem_load) << " sec" << endl;

    // Clear memory
    time(&start_time);
    cout << "Clearing memory...";
    for (auto y = 0; y < height; y++) {
        delete (*image)[y];
    }
    delete image;
    cout << "DONE: ";
    time(&end_time);
    auto mem_clear = (int)difftime(end_time, start_time);
    cout << to_string(mem_clear) + " sec" << endl;
}

This function uses arrays for load clear:

void SimpleLoadAndClear_Array(int width, int height) {
    time_t start_time, end_time;

    // Load memory
    time(&start_time);
    cout << "Loading image into memory...";

    auto width_bytes = width / 8;
    auto image = new unsigned char*[height];
    for (auto y = 0; y < height; y++) {
        image[y] = new unsigned char[width_bytes];
        auto row_ptr = image[y];
        for (auto b = 0; b < width_bytes; b++) {
            row_ptr[b] = 0xFF;
        }
    }
    cout << "DONE: ";
    time(&end_time);
    auto mem_load = (int)difftime(end_time, start_time);
    cout << to_string(mem_load) << " sec" << endl;

    // Clear memory
    time(&start_time);
    cout << "Clearing memory...";

    for (auto y = 0; y < height; y++) {
        delete[] image[y];
    }
    delete[] image;
    cout << "DONE: ";
    time(&end_time);
    auto mem_clear = (int)difftime(end_time, start_time);
    cout << to_string(mem_clear) + " sec" << endl;
}

This is main function to call the above load/clear functions:

void main()
{
    auto width = 455960;
    auto height = 453994;
    auto i_max = 50;
    for (auto i = 0; i < i_max; i++){
        SimpleLoadAndClear_Vector(width, height);
    }
}

Test output from vector version looks as follows after 50 iterations (clearly the load/clear time increases more and more):

Loading image into memory...DONE: 19 sec
Clearing memory...DONE: 24 sec
Loading image into memory...DONE: 40 sec
Clearing memory...DONE: 20 sec
Loading image into memory...DONE: 27 sec
Clearing memory...DONE: 39 sec
Loading image into memory...DONE: 35 sec
Clearing memory...DONE: 24 sec
Loading image into memory...DONE: 27 sec
Clearing memory...DONE: 34 sec
Loading image into memory...DONE: 33 sec
Clearing memory...DONE: 29 sec
Loading image into memory...DONE: 27 sec
Clearing memory...DONE: 35 sec
Loading image into memory...DONE: 32 sec
Clearing memory...DONE: 33 sec
Loading image into memory...DONE: 28 sec
Clearing memory...DONE: 37 sec
Loading image into memory...DONE: 31 sec
Clearing memory...DONE: 35 sec
Loading image into memory...DONE: 30 sec
Clearing memory...DONE: 38 sec
Loading image into memory...DONE: 31 sec
Clearing memory...DONE: 38 sec
Loading image into memory...DONE: 31 sec
Clearing memory...DONE: 41 sec
Loading image into memory...DONE: 32 sec
Clearing memory...DONE: 40 sec
Loading image into memory...DONE: 33 sec
Clearing memory...DONE: 42 sec
Loading image into memory...DONE: 35 sec
Clearing memory...DONE: 43 sec
Loading image into memory...DONE: 34 sec
Clearing memory...DONE: 46 sec
Loading image into memory...DONE: 36 sec
Clearing memory...DONE: 47 sec
Loading image into memory...DONE: 35 sec
Clearing memory...DONE: 49 sec
Loading image into memory...DONE: 37 sec
Clearing memory...DONE: 50 sec
Loading image into memory...DONE: 37 sec
Clearing memory...DONE: 51 sec
Loading image into memory...DONE: 39 sec
Clearing memory...DONE: 51 sec
Loading image into memory...DONE: 39 sec
Clearing memory...DONE: 53 sec
Loading image into memory...DONE: 40 sec
Clearing memory...DONE: 52 sec
Loading image into memory...DONE: 40 sec
Clearing memory...DONE: 55 sec
Loading image into memory...DONE: 41 sec
Clearing memory...DONE: 56 sec
Loading image into memory...DONE: 41 sec
Clearing memory...DONE: 59 sec
Loading image into memory...DONE: 42 sec
Clearing memory...DONE: 59 sec
Loading image into memory...DONE: 42 sec
Clearing memory...DONE: 60 sec
Loading image into memory...DONE: 44 sec
Clearing memory...DONE: 60 sec
Loading image into memory...DONE: 44 sec
Clearing memory...DONE: 63 sec
Loading image into memory...DONE: 44 sec
Clearing memory...DONE: 63 sec
Loading image into memory...DONE: 45 sec
Clearing memory...DONE: 64 sec
Loading image into memory...DONE: 46 sec
Clearing memory...DONE: 65 sec
Loading image into memory...DONE: 45 sec
Clearing memory...DONE: 67 sec
Loading image into memory...DONE: 47 sec
Clearing memory...DONE: 69 sec
Loading image into memory...DONE: 47 sec
Clearing memory...DONE: 70 sec
Loading image into memory...DONE: 48 sec
Clearing memory...DONE: 72 sec
Loading image into memory...DONE: 48 sec
Clearing memory...DONE: 74 sec
Loading image into memory...DONE: 49 sec
Clearing memory...DONE: 74 sec
Loading image into memory...DONE: 50 sec
Clearing memory...DONE: 74 sec
Loading image into memory...DONE: 50 sec
Clearing memory...DONE: 76 sec
Loading image into memory...DONE: 51 sec
Clearing memory...DONE: 78 sec
Loading image into memory...DONE: 53 sec
Clearing memory...DONE: 78 sec
Loading image into memory...DONE: 53 sec
Clearing memory...DONE: 80 sec
Loading image into memory...DONE: 54 sec
Clearing memory...DONE: 80 sec
Loading image into memory...DONE: 54 sec
Clearing memory...DONE: 82 sec
Loading image into memory...DONE: 55 sec
Clearing memory...DONE: 91 sec
Loading image into memory...DONE: 56 sec
Clearing memory...DONE: 84 sec
Loading image into memory...DONE: 56 sec
Clearing memory...DONE: 88 sec

Test output from array version looks as follows after 50 iterations (clearly the load/clear time is stable and does not increase more and more):

Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 26 sec
Loading image into memory...DONE: 26 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 17 sec
Clearing memory...DONE: 26 sec
Loading image into memory...DONE: 26 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 26 sec
Loading image into memory...DONE: 26 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 17 sec
Clearing memory...DONE: 26 sec
Loading image into memory...DONE: 26 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 26 sec
Loading image into memory...DONE: 26 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 17 sec
Clearing memory...DONE: 26 sec
Loading image into memory...DONE: 26 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 26 sec
Loading image into memory...DONE: 26 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 25 sec
Loading image into memory...DONE: 27 sec
Clearing memory...DONE: 17 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 26 sec
Loading image into memory...DONE: 26 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 17 sec
Clearing memory...DONE: 26 sec
Loading image into memory...DONE: 26 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 17 sec
Clearing memory...DONE: 26 sec
Loading image into memory...DONE: 26 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 26 sec
Loading image into memory...DONE: 26 sec
Clearing memory...DONE: 17 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 26 sec
Loading image into memory...DONE: 25 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 26 sec
Loading image into memory...DONE: 25 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 26 sec
Loading image into memory...DONE: 25 sec
Clearing memory...DONE: 19 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 26 sec
Loading image into memory...DONE: 25 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 25 sec
Loading image into memory...DONE: 26 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 25 sec
Loading image into memory...DONE: 26 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 25 sec
Loading image into memory...DONE: 25 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 25 sec
Loading image into memory...DONE: 26 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 26 sec
Loading image into memory...DONE: 25 sec
Clearing memory...DONE: 17 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 26 sec
Loading image into memory...DONE: 25 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 25 sec
Loading image into memory...DONE: 26 sec
Clearing memory...DONE: 18 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 25 sec
Loading image into memory...DONE: 25 sec
Clearing memory...DONE: 19 sec
Loading image into memory...DONE: 18 sec
Clearing memory...DONE: 25 sec
Loading image into memory...DONE: 26 sec
Clearing memory...DONE: 18 sec

Questions

  1. Is this Windows that handle memory operations in a bad way when dealing with huge std::vectors?
  2. Is it std::vectors that just performs crappy with huge data, by design?
  3. Have I totally miss-understood something?
  4. Is there any other obvious std container I should have used instead (I need to access the image data by index in both x and y from different threads)?
  5. Any other good explanation and suggested solution?
Adrian Colomitchi
  • 3,974
  • 1
  • 14
  • 23
Arvid
  • 91
  • 6
  • 2
    Can you reserve the memory just once and reuse it? – mikkola Nov 24 '16 at 10:30
  • 2
    Are you testing a Release or Debug build ? – nos Nov 24 '16 at 10:30
  • @mikkola unfortunately not. – Arvid Nov 24 '16 at 10:31
  • do you have memory leaks with your vector implementation or the memory is stable? – Jean-François Fabre Nov 24 '16 at 10:32
  • @nos I am testing with Release build. – Arvid Nov 24 '16 at 10:34
  • @Jean-FrançoisFabre No memory-leaks I can see. I can follow the memory usage of the process and see that it loads up to 24 GB and all cleared and freed again, even after 100 iterations. – Arvid Nov 24 '16 at 10:36
  • 3
    I cannot test due to lack of RAM, but the likely explanation is that std::vector uses a different memory allocator, perhaps even a memory pool, in the background and it happens to be inefficient when you throw *cough cough* 100GB allocations at it. – Sven Nilsson Nov 24 '16 at 11:04
  • 1
    Didn't Herb Sutter present some kind of custom memory management on this year's CppCon? I'll try to find out - maybe this will help you. Apart from this: How about some custom datastructure/class you can use. What do you think of `std::array`? I would also recommend using smart pointers instead of raw pointers. – Simon Kraemer Nov 24 '16 at 11:16
  • Found the [talk](https://www.youtube.com/watch?v=JfmTagWcqoE) but it's more applicable for trees. – Simon Kraemer Nov 24 '16 at 11:20
  • @SvenNilsson I think you have a good point there. Do you think it is OS/compiler specific how the memory management is handeled for vector? My usage and needs is a bit abnormal I guess with so much data in memory. – Arvid Nov 24 '16 at 11:35
  • @SimonKraemer Thanks for the link. Herb Sutter always have good talks. I use smart pointers in my actual implementation, but ended up with raw pointers in this test app to illiminate that as an issue and isolate the issue as much as possible. std::array is maybe a better solution than native array. Will try that, thanks. – Arvid Nov 24 '16 at 11:40
  • @Arvid Why do you have a nested `for` loop using `new vector` to create a 2 dimensional vector? You don't need to use `new` and you don't need a nested `for` loop to create a 2 dimensional vector. You could even speed up your non-vector implementation since the 2-d array is not "ragged". – PaulMcKenzie Nov 24 '16 at 11:41
  • 1
    @Arvid -- See [this thread](http://stackoverflow.com/questions/21943621/how-to-create-a-contiguous-2d-array-in-c/21944048#21944048). You're calling `new` thousands of times in the loop when all you need is to call `new` twice. Again, your array is not ragged (different sized rows), so all you need is to allocate a single pool of memory and just point the row pointers to the right position in the pool. That not only speeds up the code, your memory will more than likely not be as fragmented. – PaulMcKenzie Nov 24 '16 at 11:48
  • SSD based fast disk access can be used as proxy in replacement to your RAM usage – seccpur Nov 24 '16 at 11:55
  • 3
    @Arvid *Is it std::vectors that just performs crappy with huge data, by design?* -- If you take a look at the way you wrote the code, you are creating a vector object *and* calling the allocator thousands of times. Creating the object requires construction, something that won't happen with just a pointer and `new`. So it isn't `vector` itself that is "crappy", it is that you need to rearrange your code so that you're not creating so many objects and not calling the allocator so much. – PaulMcKenzie Nov 24 '16 at 11:56
  • 2
    Or use a single (w x h) vector with an [column/row major](https://en.wikipedia.org/wiki/Row-_and_column-major_order) indexing scheme? – Adrian Colomitchi Nov 24 '16 at 11:59
  • @PaulMcKenzie Thanks for your tips! You are totally right about the allocation. I guess I should have seen that. Now when I try to allocate the whole thing once it goes a lot faster, and stable timing. Even with smart pointer and vector. – Arvid Nov 24 '16 at 13:17
  • @seccpur Thanks for the tip, but hardware is not an issue. I need it to go as fast as possible, and RAM should be faster than SSD. – Arvid Nov 24 '16 at 13:20
  • @Arvid -- There is one issue in that the number of bytes to allocate once may exceed what `new[]` can handle (`std::size_t`). So you may still have some work to do, but the general idea is to reduce the number of allocations you will need. – PaulMcKenzie Nov 24 '16 at 13:24
  • ok, you're using 64-bit, so `sizeof(size_t)` should be 8. Just make sure that when you do multiply two large `unsigned` values, you're not creating an overflow somewhere. – PaulMcKenzie Nov 24 '16 at 13:33
  • @PaulMcKenzie Yes I had an issue with large numbers at first. Had to use long long instead of int for the height and width. – Arvid Nov 24 '16 at 13:52

1 Answers1

2

What I did wrong was that I was calling the vector allocator for every row in the image (thousands of times). When allocating the whole thing as one vector at first and then map the different rows to the correct location in the big vector, problem solved.

Thanks to @PaulMcKenzie for answers pointing me in the right direction.

Arvid
  • 91
  • 6