3

I'm working on a project using STXXL, which I understand to be an out of core version of the C++ STL library. Currently, my program is running fine with it, but the problem I'm facing now is that when the program is running, it uses close to 2GB of memory (with a low to medium sized data set).

In my program, I'm using 25 STXXL vectors, stored in individual files on disk. As for my .stxxl file, I currently have it set to dynamically allocate the disk file (by setting the disk size to 0).

So, my question is: is there a way to explicitly get STXXL to use the hard disk as opposed to RAM? Or is this amount of memory usage to be expected when using this library?

Thanks in advance for any advice anyone can provide.

Andrewziac
  • 155
  • 2
  • 16

2 Answers2

3

What bobb_the_builder says about the RAM usage of the stxxl:vector is correct.

See the following code:

#include <stxxl/vector>

int main()
{
    // create vector
    //stxxl::VECTOR_GENERATOR<int>::result vector[25];
    stxxl::VECTOR_GENERATOR<int, 1, 1, 1*1024*1024>::result vector[25];

    // fill vectors with integers
    for (size_t i = 0; i < 100 * 1024 * 1024 * 1024llu; ++i) {
        vector[i % 25].push_back(i);
    }

    return 0;
}

On Linux, the program's resident memory size grows to 27528 KiB when using and to about 1,6 GiB when using which is .

Does the Windows manager show the same? Is this maybe a STXXL bug only on Windows, or just does the task manager show different memory sizes?

Timo Bingmann
  • 294
  • 1
  • 4
  • Thanks for the example Timo! In the end, I made a new solution with these principals in mind and I was able to get the memory usage down to a very reasonable 50 MB per vector using a minimal constructor like so: `vector(1000000);`. Thanks for all your help! – Andrewziac Dec 16 '13 at 19:45
2

I guess you are using the STXXL::VECTOR_GENERATOR template to create the 25 stxxl::vector's you mentioned in your posting? The internal memory usage of stxxl::vector's in general depends on your individual configuration (i.e. block_size * page_size * cache_pages) as described in STXXL documentation on STXXL::VECTOR_GENERATOR. That all together sums up into the reserved internal(=main) memory consumption. As far as i know the STXXL tries to allocate as much internal memory as your containers are using (if possible) as caches depending on those template parameters.

Note: the default values for the aforementioned template parameters are:

page_size = 4 
cache_pages = 8 
block_size = 2 MiB

Which results in a total memory consumption of 25 * (2 MiB * 4 * 8) = 1600 MiB that explains a huge part of your reported 2 GB memory consumption.

(Note: Which data_type (ValueType) are stored in your STXXL vector shouldn't really matter.)

Daniel F
  • 283
  • 3
  • 11
  • Thanks for the info! I've fiddled with the various values you mentioned and it got me a little farther, but what I'm seeing now is that as my program adds more and more elements to my vectors, the memory usage (as shown in the Windows task manager) start to grow very quickly with it(up to the 2 GB as mentioned before). I feel like it's simply storing the entire vector in memory instead of using the hard drive. Maybe I should have mentioned this before, but my vectors are required to be global and they must persist throughout the entire run of the program, would this be what's causing it? – Andrewziac Dec 12 '13 at 19:44
  • How high is your main memory consumption if you set the page_size = 1, cache_size = 1 and block_size = 1*1024*1024 (1 MiB)? How do you predefine and instanciate those stxxl::vectors? – Daniel F Dec 12 '13 at 22:59
  • How do you predefine and instanciate those stxxl::vectors? My guess is that you don't push_back enough values into the vectors that their caches overflow and elemnts need to store block-wise on your disks. Please insert many gigabyte of values and check if your memory consumption exceeds your bound of 2GB. – Daniel F Dec 12 '13 at 23:06
  • In my header file, I'm defining them as `stxxl::VECTOR_GENERATOR::result vector;` and in my c++ file, I'm instantiating them as `vector = stxxl::VECTOR_GENERATOR::result(&vectorFile);` As it stands now, I am pushing about 1.5 to 2 GB of data into the vectors, but for my purposes, this is the maximum amount of data that I'll need to store in these vectors, so if the problem is that I'm not storing enough in my vectors, then maybe STXXL isn't the best library to use – Andrewziac Dec 13 '13 at 12:50