-1

In scaling up the problem size I'm handing to a self-coded program I started to bump into Linux's OOM killer. Both Valgrind (when ran on CPU) and cuda-memcheck (when ran on GPU) do not report any memory leaks. The memory usage keeps expanding while iterating through the inner loop, while I explicitly clear the vectors holding the biggest chunk of data at the end of the this loop. How can I ensure this memory hogging will disappear?

Checks for memory leaks were performed, all the memory leaks are fixed. Despite this, Out of Memory errors keep killing the program (via the OOM Killer). Manual monitoring of memory consumption shows an increase in memory utilisation, even after explicitly clearing the vectors containing the data.

Key to know is having three nested loops, one outer containing the sub-problems at hand. The middle loop loops over the Monte Carlo trials, with an inner loop running some sequential process required inside the trial. Pseudo-code looks as follows:

std::vector<object*> sub_problems;

sub_problems.push_back(retrieved_subproblem_from_database);

for(int sub_problem_index = 0; sub_problem_index < sub_problems.size(); ++sub_problem_index){
  std::vector< std::vector<float> > mc_results(100000, std::vector<float>(5, 0.0));
  for(int mc_trial = 0; mc_trial < 100000; ++mc_trial){
    for(int sequential_process_index = 0; sequential_process_index < 5; ++sequential_process_index){
      mc_results[mc_trial][sequential_process_index] = specific_result;
    }
  }

  sub_problems[sub_problem_index]->storeResultsInObject(mc_results);
  // Do some other things
  sub_problems[sub_problem_index]->deleteMCResults();
}

deleteMCResults looks as follows:

bool deleteMCResults() {
  for (int i = 0; i < asset_values.size(); ++i){
    object_mc_results[i].clear();
    object_mc_results[i].shrink_to_fit();
  }
  object_mc_results.clear();
  object_mc_results.shrink_to_fit();
  return true;
}

How can I ensure memory consumption to be solely dependent on the middle and inner loop instead of the outer loop? The second, and third and fourth and so, could theoretically use exactly the same memory space/addresses as utilised for the first iteration.

lenik
  • 23,228
  • 4
  • 34
  • 43
j73951
  • 31
  • 3
  • 1
    `std::vector sub_problems;` - are these `new`'ed objects? I.e. with a heap allocation per `object`? Because that carries per-allocation overhead. The snippet doesn't show whether that's problem; a few big objects are not a problem but many tiny objects are. – MSalters Sep 01 '19 at 11:12
  • Note that `shrink_to_fit` is not required to do that. And there is no point in doing anything to the nested vectors since they get destroyed after the loop. – molbdnilo Sep 01 '19 at 11:20

3 Answers3

1

Perhaps I'm reading your pseudocode too literally, but it looks like you have two mc_results variables, one declared inside the for loop and one that deleteMCResults is accessing.

In any case, I have two suggestions for how to debug this. First, rather than letting the OOM killer strike, which takes a long time, is unpredictable, and might kill something important, use ulimit -v to put a limit on process size. Set it to something reasonable like, say, 1000000 (about 1GB) and work on keeping your process under that.

Second, start deleting or commenting out everything except the parts of the program that allocate and deallocate memory. Either you will find your culprit or you will make a program small enough to post in its entirety.

Scott McPeak
  • 8,803
  • 2
  • 40
  • 79
  • Thanks for the suggestions. Just to clarify on the variables: The results are stored in the instance of the object (passed by ref and then a simple private_var = argument. Using the same names was a bit misleading, after all I'm transferring the vector of vectors from the local scope to the object instance where it's stored as a private var. After doing some additional analyses on it, I delete the results as described above. In reality, the vector of vectors can be a size up to 500MB. – j73951 Sep 01 '19 at 11:07
1

deleteMCResults() can be written a lot simpler.

void deleteMCResults() {
  decltype(object_mc_results) empty;
  std::swap(object_mc_results, empty);
}

But in this case, I'm wondering if you really want to release the memory. As you say, the iterations could reuse the same memory, so perhaps you should replace deleteMCResults() with returnMCResultsMemory(). Then hoist the declaration of mc_results out of the loop, and just reset its values to 5.0 after returnMCResultsMemory() returns.

MSalters
  • 173,980
  • 10
  • 155
  • 350
0

There is one thing that could easily be improved from the code you show. However, it is really not enough and not precise enough info to make a full analysis. Extracting a relevant example ([mcve]) and perhaps asking for a review on codereview.stackexchange.com might improve the outcome.

The simple thing that could be done is to replace the inner vector of five floats with an array of five floats. Each vector consists (in typical implementations) of three pointers, to the beginnig and end of the allocated memory and another one to mark the used amount. The actual storage requires a separate allocation, which in turn incurs some overhead (and also performance overhead when accessing the data, keyword "locality of reference"). These three pointers require 24 octets on a common 64-bit machine. Compare that with five floats, those only require 20 octets. Even if those floats were padded to 24 octets, you would still benefit from eliding the separate allocation.

In order to try this out, just replace the inner vector with a std::array (https://en.cppreference.com/w/cpp/container/array). Odds are that you won't have to change much code, raw arrays, std::array and std::vector have very similar interfaces.

Ulrich Eckhardt
  • 16,572
  • 3
  • 28
  • 55
  • Great suggestions! Will change them to arrays, good point. So the overall thing is that this vector of vectors, or vector of arrays, or array of arrays, can be of dimension 2mln * 60, filled with floats. Even including overhead, this should never pass 4gb. ```free -m``` is returning a growing stack with every middle loop iteration, exceeding 30gb in usage! Successful program termination releases all mem. This can't be solely explain by the STL choice, right? – j73951 Sep 01 '19 at 11:12
  • No, not really. As mentioned, there's simply insufficient info. Some suspicious things remain though, like e.g. the use of raw pointers, which are always dangerous. Also, if you already know where the leak happens and that it is correctly released on shutdown, that could allow you to make up a [mcve]. – Ulrich Eckhardt Sep 01 '19 at 11:18