cpp - large array pass by reference

Question

I have a very large array which has around 10000000 double entries (in the code below this number was set to NX=1000 but actually it will be 10000000 in realistic situations). I want to do some calculations with this array from which I get a new array. Then I do again some calculations with this new array and so on. I am looking for the most efficient way to do these manipulations with respect to RAM (I believe that this is the limiting factor here, since my arras are so large). I know that because of the large arrays I should try to pass the array as much as possible per reference. However I have problems with passing the arrays to other functions. Here is the code:

#include <iostream>
#include <array>


#define NX 1000

////////////////////////
//Function declaration//
////////////////////////
void time_propagation(std::array<double,NX>& X1,std::array<double,NX>& X2, double dt); 

std::array<double,NX> F(std::array<double,NX>& X);

///////////////////////
//operator overloading/
///////////////////////

// double* Array
std::array<double,NX> operator*(double d,std::array<double,NX> X){
  std::array<double,NX> res={};
  for(int i=0;i<NX;i++){
    res[i]=X[i]*d;
  } 
  return res;
}

//Array + Array
std::array<double,NX> operator+(std::array<double,NX> X1,std::array<double,NX> X2){
  std::array<double,NX> res={};
  for(int i=0;i<NX;i++){
    res[i]=X1[i]+X2[i];
  }
  return res;
}

/////////////
//main()/////
/////////////

int main () {
  std::array<double,NX> X0={};  
  std::array<double,NX> X1 = {};
  time_propagation(X0,X1,0.1);
  //Now I will write X0 to a file. Afterwards X0 is not needed anymore. Only X1. Do I have to manually delete X0?
  std::array<double,NX> X2 = {};
  time_propagation(X1,X2,0.1);
  //Now I will write X1 to a file. Afterwards X1 is not needed anymore. Only X2. Do I have to manually delete X1?
  //... I have to do this step very often
 
 
  return 0;
}

///////////////////////
//Function definitons//
///////////////////////

std::array<double,NX> F(std::array<double,NX>& X){
  std::array<double,NX> R={};
  double a=X[NX-2];
  double b=X[NX-1];
  R[NX-1]=-a;
  R[NX-1]=3.0;
  
  return R;
}

void time_propagation(std::array<double,NX>& X1,std::array<double,NX>& X2,double dt){
  
  std::array<double,NX> K1=F(X1);
  std::array<double,NX> K2=F(X1+dt/2.0*K1);
  std::array<double,NX> K3=F(X1+dt/2.0*K2);
  std::array<double,NX> K4=F(X1+dt*K3);      
  X2=X1+1.0/6.0*dt*(K1+2.0*K2+2.0*K3+K4); 
}

I know that my error is in the function "time_propagation". In the fifth last line I call the function "F" with the argument "X1+dt/2.0*K1". "X1" was passed by reference and "K1" was defined before. Do I first have to create a copy and the call the function on that copy? Something like that:

 void time_propagation(std::array<double,NX>& X1,std::array<double,NX>& X2,double dt){
      
  std::array<double,NX> argument={};
  for(int i=0;i<NX;i++){argument[i]=X1[i];}      

  std::array<double,NX> K1=F(argument);
  for(int i=0;i<NX;i++){argument[i]=X1[i]+dt/2.0*K1[i];}   
  std::array<double,NX> K2=F(argument);
  for(int i=0;i<NX;i++){argument[i]=X1[i]+dt/2.0*K2[i];}   
  std::array<double,NX> K3=F(argument);
  for(int i=0;i<NX;i++){argument[i]=X1[i]+dt/2.0*K3[i];}   
  std::array<double,NX> K4=F(argument);   
  for(int i=0;i<NX;i++){X2[i]=X1[i]+1.0/6.0*dt*(K1[i]+2.0*K2[i]+2.0*K3[i]+K4[i]); }      
}

If I use this new "time_propagation" function then I don't get any errors anymore, but my feeling is that this workaround is very bad and there is a nicer way of doing this? Second and most importantly even with this new version of "time_propagation()", I get a "Segmentation fault (core dumped)" if I increase NX to 10000000. I guess this happens because the code is very badly written? In general my laptop should be able to store 10000000 doubles in the RAM? I have 16 GB of RAM... and 10000000 doubles require only 0.2 GB of RAM!?

You are very likely experiencing a stack overflow. `std::array` is allocated on the stack - not the heap. Stack size varies by compiler and operating system (see [here](https://stackoverflow.com/questions/1825964/c-c-maximum-stack-size-of-program/1826072)). — Simon Kraemer, Jul 02 '21 at 20:47
I would also advise to use `constexpr size_t NX{1000};` over a postprocessor definition. — Simon Kraemer, Jul 02 '21 at 20:52
Thanks for this hint! I was using the postprocessor definition just out of laziness... — jojo123456, Jul 02 '21 at 20:53
*In general my laptop should be able to store 10000000 doubles in the RAM? I have 16 GB of RAM* -- That RAM goes to waste if the program is 32-bit. Only a 64-bit program can take advantage of all that memory. — PaulMcKenzie, Jul 02 '21 at 20:54
I was using std::array because then I don't have to care about where space is allocated!? — jojo123456, Jul 02 '21 at 20:54
Also `X1+dt/2.0*K1` does not work. What do you expect this statement to do? — Simon Kraemer, Jul 02 '21 at 20:54
@SimonKraemer yeah I know! That's why I wrote the operator overloading. But somehow this does not work. The error does not exist in the second alternative version of "time_propagation()" — jojo123456, Jul 02 '21 at 20:57
*I was using std::array because then I don't have to care about where space is allocated!* -- There is a huge difference between whether something is allocated on the stack or is heap allocated. Not knowing or not caring about the difference shouldn't happen. The stack is limited to usually 8MB or less. — PaulMcKenzie, Jul 02 '21 at 20:57
@PaulMcKenzie I know the differences between stack and heap. What do you recomment me to change in my program structure such that I don't encounter the segmentation fault? — jojo123456, Jul 02 '21 at 20:59
Use `std::vector`. It doesn't eat up the stack space **and** you don't have to recompile for different `N`s. — n. m. could be an AI, Jul 02 '21 at 21:02
Thanks a lot for the hints! I think this is very helpful! I will come back if the problem remains! — jojo123456, Jul 02 '21 at 21:02
Ok. I replaced all "std::array" with "std::vactor". However, this still gives me problems when I want to use the original definition of the function "time_propagation()" in the first code snippet in the original post. When I use the "alternative version" of the function in the second code snippet in the original post, I don't get any problems. My question is now would you have solved the problem like I did it? For me this looks not very elegant and there might also be a solution which gives a better performance (is faster and uses less RAM)!? — jojo123456, Jul 02 '21 at 22:00
P.S. I can also post the modified code, but in principle all I did was to replace "std::array" with "std::vactor". Let me know if I should update the original post!? — jojo123456, Jul 02 '21 at 22:02

prehistoricpenguin · Accepted Answer · 2021-07-04T01:55:54.930

According to your comment, you are very close to the answer. One thing you missed is that besides array is stack-based and vector is heap-based there is one more difference between array and vector: the vector needs to call its constructor(or resize) to enlarge size. So after switching to vector, we need a slight modify.

How to reduce memory usage?

Change the unnecessary copy arguments to reference.

How to manually recycle a vector?

Use a lambda to call swap or clear and shrink_to_fit

Bonus: To remove the unnecessary temporary object in operator * and operator +, see Expression_templates

A fixed code might like this:

#include <array>
#include <iostream>
#include <vector>

#define NX 1000

////////////////////////
// Function declaration//
////////////////////////
void time_propagation(std::vector<double>& X1, std::vector<double>& X2,
                      double dt);

std::vector<double> F(const std::vector<double>& X);

///////////////////////
// operator overloading/
///////////////////////

// double* Array
std::vector<double> operator*(double d, const std::vector<double>& X) {
  std::vector<double> res(NX);
  for (int i = 0; i < NX; i++) {
    res[i] = X[i] * d;
  }
  return res;
}

// Array + Array
std::vector<double> operator+(const std::vector<double>& X1,
                              const std::vector<double>& X2) {
  std::vector<double> res(NX);
  for (int i = 0; i < NX; i++) {
    res[i] = X1[i] + X2[i];
  }
  return res;
}

/////////////
// main()/////
/////////////

int main() {
  auto release_vec = [](auto& vec) {
    vec.clear();
    vec.shrink_to_fit();
  };
  std::vector<double> X0(NX);
  std::vector<double> X1(NX);
  time_propagation(X0, X1, 0.1);
  // Now I will write X0 to a file. Afterwards X0 is not needed anymore. Only
  // X1. Do I have to manually delete X0?
  release_vec(X0);
  std::vector<double> X2(NX);
  time_propagation(X1, X2, 0.1);
  // Now I will write X1 to a file. Afterwards X1 is not needed anymore. Only
  // X2. Do I have to manually delete X1?
  //... I have to do this step very often
  release_vec(X1);

  return 0;
}

///////////////////////
// Function definitons//
///////////////////////

std::vector<double> F(const std::vector<double>& X) {
  std::vector<double> R(NX);
  double a = X[NX - 2];
  double b = X[NX - 1];
  R[NX - 1] = -a;
  R[NX - 1] = 3.0;

  return R;
}

void time_propagation(std::vector<double>& X1, std::vector<double>& X2,
                      double dt) {
  std::vector<double> K1 = F(X1);
  std::vector<double> K2 = F(X1 + dt / 2.0 * K1);
  std::vector<double> K3 = F(X1 + dt / 2.0 * K2);
  std::vector<double> K4 = F(X1 + dt * K3);
  X2 = X1 + 1.0 / 6.0 * dt * (K1 + 2.0 * K2 + 2.0 * K3 + K4);
}

@jojo123456 There is one more enhancement direction: to remove the temporary object in `operator *` and `operator +` : see this: http://eigen.tuxfamily.org/index.php?title=Expression_templates https://en.wikipedia.org/wiki/Expression_templates — prehistoricpenguin, Jul 04 '21 at 01:54

cpp - large array pass by reference

1 Answers1