Nested for loops and Verlet lists optimization C++

Question

I'm doing a simulation of Molecular dynamics with C++. The main structure of my code is the following

particle Molecule[N];
collider Newton;
//initialization of the molecules
    for(t=0; t<=tmax; t+=dt){
    for(int i=0; I<N; i++)Newton.MovePosition(Molecule[i]);
    Newton.CalculateForces(Molecule);
    for(int i=0; I<N; i++)Newton.MoveVelocity(Molecule[i])
    for(int i=0; I<N; i++)Newton.MovePosition(Molecule[i]);
    .
    .
    .
}

Where the collider class is a friend to the particle class and knows where every particle is, what's its velocity, and so forth. The particles interact with a normal Lenard Jhones potential. In the compute forces I have something like this

for(int i=0; i<N; i++)
  for(int j=i+1;j<N; j++ )
    Molecule[i].F=......

I'm essentially computing the force that every other molecule does to the molecule i. The problem I have is that this program is highly inefficient. I have 2 nested for loops. The one that calculates the force and those in the integration algorithm. I want to optimize this code because with 25 particles and 10^5 iterations it takes a very long time to run and for the simulation, I want to do I must have at least 10^3 molecules and 10^5 iterations. I have tried the following:

I tried parallelizing the for loops with #omp parallel for and worked slower than the serial version.
I read about Verlet lists, basically, the particles don't interact with all particles but only with their neighbors. This means that the size of the force calculation nested loop becomes much smaller. The problem with this is that to do the list you have to iterate over all particles and somehow tell each particle how is its neighbour. and then iterate over that array of neighbors. I have been thinking about how to do this in an efficient way and I haven't managed to come up with an answer.
In an effort of reducing the computational cost of the Verlet list actualization I found something called skip lists, this makes searching for something in a list an O(ln(N)) operation instead of an O(N). But really I didn't understand very well how it works and don´t know how to integrated to the Verlet list problem.
In the past I did a spacial parallelization where each core makes the computations of a certain region in space. The problem with this is that is incompatible with the Verlet lists and I have read that Verlet lists are better.
since the operations of moving Position and velocity are something like V+=Fdtconstant I thought that some reduction algorithm would help but I don't know anything about reduction algorithms so that dint help.
Lastly I thought that maybe in the c++ 17 or 20 there could be something in the algorithm library that could help but I know little about algorithms so it's taking me a long time to read the documentation and identifying what it's useful. So far I have started adding the for auto with vectors instead of arrays. Also, I have a small library that allows me to define variables as 3D vectors and make the normal vector operations.

So I'm stuck here in a position where I don't know enough about programing to know what to do and where the solutions I found have the same problems that I'm having now. So, if someone can help me with advice on how to integrate all this stuff, or how to optimize my code, or if it knows a special data structure or algorithm that could help I really appreciate the help. the goal is to make the integration algorithm of a single time step at least O(N). Thanks.

Vectorizing is key for this. Also, Verlet lists too, recalculate every 10 iterations or so. Not sure how you have your classes structured, may help to use arrays of the different components of the molecule (improved access pattern). Also, keep in mind force of a on b equal negative force of b on a (if you don't already). — ChrisMM, May 15 '21 at 02:04
With Vectorizing, you mean using std::vector or something else? I have already the 3 Newton law implemented. The class molecule has a 3Dvector for the position, velocity, and force acting on it so I can add, multiply, get the norm, and all that with 1 operation. Do you know or have where can I see good documentation of Verlet list implementation? Thanks for your comment. — Carlos Andrés del Valle, May 15 '21 at 02:13
See https://stackoverflow.com/questions/1422149/what-is-vectorization ... I can try to dig up old papers I have on this, which show speedups. I have parallel versions using tasks, MPI, and both. The key is distributing the workload, and loading data in a way that is "processor/memory friendly" — ChrisMM, May 15 '21 at 02:37
That would help a lot. Thanks. Especially regarding the Verlet lists. — Carlos Andrés del Valle, May 15 '21 at 02:49
A short article here: https://zone.biblio.laurentian.ca/handle/10219/2703?locale=en unfortunately, the main one I have on Vectorization is printed out, and I cannot find a digital copy. — ChrisMM, May 15 '21 at 03:14
Note that OpenMP can help to vectorize your code easily (as well as compiler specific directives), but using object programming tends to prevent compilers vectorizing the code (resulting in slower codes). Be aware of the [AoS VS SoA performance issue](https://en.wikipedia.org/wiki/AoS_and_SoA). Using SoA is generally better performance-wise as it helps compilers vectorizing the code and often reduce the amount of cache-misses. — Jérôme Richard, May 15 '21 at 11:52
@ChrisMM I read the article and I felt quite lost at the time of implementing that algorithm. I wanted to ask, do you have some example code or know where can I see an example of something similar? Especially the part when he does the list and makes a buffer to linialize the array. And also the force calculation. — Carlos Andrés del Valle, May 15 '21 at 23:23
Unfortunately, I don't even have access to it anymore. From memory, it divided the area into a grid, with cells being _rcut_ plus a small offset wide (in all dimensions) [_rcut_ being the selected cut-off point]. In order to find a particle's neighbours, you would only have to look in the surrounding 26 cells. When doing this, you'd add in a buffer, so that if a particle is currently more than _rcut_ away, but comes into range, it would still be included in calculations. Recalculating neighbours, and which particles belong to which cells, was done every 10 iterations or so. — ChrisMM, May 16 '21 at 00:05

Nested for loops and Verlet lists optimization C++

0 Answers0