4

Currently I am trying to implement a 3D solver for Poisson's equation thanks to the FFTW3 library and with MPI in C. My problem consists on solving the problem in Fourier space and then perform a backward transform which allows me to get the potential. The problem can be summarized as following: Poisson's equation and Fourier transforms. In order to save memory and computational time I use the Hermitian property of Fourier transforms.

It is expected that when the algorithm is used for a test (whose exact result is known), the numerical potential should be equal to the exact one up to a constant A: enter image description here

Therefore when the solution is computed collectively by all processors in a given mesh region, I expect that the difference between the numerical value (provided by the code) and the exact potential should be the constant A (I do not really care about A value but it should be the same for all processors).

I used my code for the classic test of the uniform sphere in order to retrieve known results but so far all processors give a different result for the difference between both potentials. For example when I use 4 processors, the output is:

0.1757809879851610634915459741023369133472
0.1114339829619698657436899225103843491524
0.1114339829619699351326289615826681256294
0.1757809879851610634915459741023369133472

Where the output is: enter image description here

Since the difference between both potentials is not constant, this means that the potential is not well computed by the solver. Do you have any idea what I am doing wrong ? (Maybe a subtle detail in FFTW ?) At the end of the post you can find a minimal part of my code that prints (for each processor) the max error between the exact and numerical potential.

So far I have not been able to find out why I am not getting the expected result. I suspect that the error comes from the convention used for the wave vectors in the Fourier decomposition but I tried the two classical ones and it doesn't help... Below are these two conventions for the wave vector definition (for k_x): convention 1 and convention 2

Thank you for your help !

P-S:

  1. I checked that the norm after the two transforms is correct.
  2. My MPI/FFTW setup seems correct since I recover an initial function when I perform successive forward and backward transforms on a test function.

Code:

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#include <fftw3-mpi.h>
#include <string.h> // for memset
#include <math.h>

#define MAX(i, j) (((i) > (j)) ? (i) : (j))

int main(int argc, char* argv[])
{
    // General variables
    ptrdiff_t i, j, k;
    double kx, ky, kz;
    double k_square;
    double error=0;

    // FFTW
    fftw_plan plan_r2c, plan_c2r;
    double *density_real, *potential_real;
    fftw_complex *potential_complex, *density_complex;
    double *potential_exact, *square_wave_vector;
    ptrdiff_t alloc_local, local_nx, local_x_start;


    // 3D meshgrid 
    const ptrdiff_t Nx = 100, Ny = 100, Nz = 100;
    const ptrdiff_t Nzh = Nz/2+1;
    double x, y, z;
    double xlen=1, ylen=1, zlen=1;
    double dx=xlen/(Nx-1), dy=ylen/(Ny-1), dz=zlen/(Nz-1);
    double xc=xlen/2, yc=ylen/2, zc=zlen/2;
    double xini=-xlen/2, yini=-ylen/2, zini=-zlen/2;
    double FFT_norm = 1.0/Nx/Ny/Nz;


    /* Initialise MPI */

    MPI_Init(&argc, &argv);


    /*Initialize FFTW */
    fftw_mpi_init();

    alloc_local = fftw_mpi_local_size_3d(Nx, Ny, Nzh, MPI_COMM_WORLD,
                                               &local_nx, &local_x_start);

    /* Here we made use of Hermitian property of Fourier transforms: Nz -> Nz/2+1 */

    /* Allocation of arrays for the transform */
    density_real       = fftw_alloc_real(2*alloc_local);
    potential_real     = fftw_alloc_real(2*alloc_local);
    potential_exact    = fftw_alloc_real(2*alloc_local);

    density_complex    = fftw_alloc_complex(alloc_local);
    potential_complex  = fftw_alloc_complex(alloc_local);

    /* create plan for in-place forward DFT */
    plan_r2c = fftw_mpi_plan_dft_r2c_3d(Nx, Ny, Nz,
                                        density_real, density_complex,
                                        MPI_COMM_WORLD, FFTW_ESTIMATE);

    plan_c2r = fftw_mpi_plan_dft_c2r_3d(Nx, Ny, Nz,
                                        potential_complex, potential_real,
                                        MPI_COMM_WORLD, FFTW_ESTIMATE);




    /* Density sphere test + exact potential solution */
    double r=0, r0=0.2;
    double rho0=1;
    for (i = 0; i < local_nx; ++i)
    {
        for (j = 0; j < Ny; ++j)
        {
                for (k = 0; k < Nz; ++k)
                {
                        x = xini + (local_x_start + i)*dx;
                        y = yini + j*dy;
                        z = zini + k*dz;
                        r = sqrt(pow(x,2)+pow(y,2)+pow(z,2));

                        /* Uniform density sphere */
                        if (r<r0) {
                                density_real[k+2*Nzh*(i*Ny + j)] = rho0;
                                potential_exact[k+2*Nzh*(i*Ny + j)] =
                                                       -2*M_PI*rho0*(pow(r0,2)-pow(r,2)/3.0);
                        } else {
                                density_real[k+2*Nzh*(i*Ny + j)] = 0;
                                potential_exact[k+2*Nzh*(i*Ny + j)] =
                                                       -4*M_PI*rho0*pow(r0,3)/3.0/r;
                        }
                }
        }
    }


    /* Execute forward tranform */
    fftw_execute(plan_r2c);


    /* Complex potential */
    for (i=0; i < local_nx; i++)
    {
        if (2*(local_x_start+i)<Nx){
                kx = 2*M_PI*(local_x_start + i)/xlen;
        } else {
                kx = 2*M_PI*(Nx - local_x_start - i)/xlen;
        }
        for (j=0;j<Ny;j++)
        {
                if (2*j<Ny){
                        ky = 2*M_PI*j/ylen;
                } else {
                        ky = 2*M_PI*(Ny - j)/ylen;
                }
                for (k=0;k<Nzh;k++)
                {
                    kz = 2*M_PI*k/zlen;
                    k_square = pow(kx,2)+pow(ky,2)+pow(kz,2);

                    if (k_square != 0.0)
                    {
                        potential_complex[k+Nzh*(j+Ny*i)][0] = -4*M_PI*
                                      density_complex[k+Nzh*(j+Ny*i)][0] / k_square;
                        potential_complex[k+Nzh*(j+Ny*i)][1] = -4*M_PI*
                                          density_complex[k+Nzh*(j+Ny*i)][1] / k_square;
                    }
                    else
                    {
                        // Phi(k=0)=0 => Potential average in the whole simulation cell
                        // is equal to 0
                        potential_complex[k+Nzh*(j+Ny*i)][0] = 0.0;
                        potential_complex[k+Nzh*(j+Ny*i)][1] = 0.0;
                    }
                }
        }
    }



    /* Execute backward tranform and normalise potential */
    fftw_execute(plan_c2r);



    for (i = 0; i < local_nx; ++i)
    {
        for (j = 0; j < Ny; ++j)
        {
                for (k = 0; k < Nz; ++k)
                {
                        potential_real[k+2*Nzh*(i*Ny + j)] = FFT_norm * potential_real[k+2*Nzh*(i*Ny + j)];
                }
        }
     }


    /* Check error  */
    for (i = 0; i < local_nx; ++i)
    {
        for (j = 0; j < Ny; ++j)
        {
                for (k = 0; k < Nz; ++k){
                        error = MAX(error, fabs(potential_real[k+2*Nzh*(i*Ny + j)]-potential_exact[k+2*Nzh*(i*Ny + j)]));
                }
        }
    }


    /* Each processor print it's maximal error */
    printf("%.40lf\n", error);

    // Tell MPI to shut down.
    MPI_Finalize();
    return EXIT_SUCCESS;
}
steven
  • 41
  • 3
  • 1
    I suggest you take out the message passing MPI calls and run it on a single node to see if you get a constant error. It may be that in message passing, floating point values are being converted to and from strings which is inexact and the MPI might work differently each run causing the differences, assuming the code is otherwise correct. To pass floating point values exactly, check out my answer here https://stackoverflow.com/questions/76270995/in-c-how-do-you-print-a-float-double-as-a-string-and-read-it-back-as-the-same-f/76272340#76272340 – Simon Goater May 24 '23 at 12:24
  • 1
    In fact, since floating point arithmetic is not strictly associative or distributive, even passing exact values might not fix it. I don't know how MPI works to say if it is exact or not, but if the calculations aren't done in the same order each time you can get differences. – Simon Goater May 24 '23 at 12:31
  • 1
    "In order to save memory and computational time" That argument only works for the sequential code. Often you can get a much more parallel code by wasting a couple of cycles. After all, what is a factor of 2 in work if you have thousand processors? Not to mention that the "computation time" saving is only "operation count saving" and may actually have worse cache behavior. – Victor Eijkhout May 25 '23 at 16:00
  • 1
    What are the 4 numbers you list? I note that they are pairwise equal to about one part in 10^15, which is double precision roundoff, so you may run into what @SimonGoater observed about fp arithmetic not being associative. – Victor Eijkhout May 25 '23 at 16:03
  • I edited my question since I think it was not really clear that I was printing the constant A in each processor. – steven Jun 07 '23 at 08:46
  • Thank you for your answer @SimonGoater. I doubt the pissue comes from float point arithmetic from two reasons. First, the code gives expected results when I simply do a successive forward and backward transformation and second I do not get different numbers at each run (but only between different processors). I will however deepen your suggestion and keep you informed. – steven Jun 07 '23 at 08:51
  • @VictorEijkhout Thank you for the advices. The answer to your question is in the new edit. – steven Jun 07 '23 at 08:52

1 Answers1

0

The solution to this question comes rather from mathematics than programming.

Indeed, when Poisson's equation in a periodic box one should subtract to the source term the mean value of the source term itself. This leads to a modified source term whose mean value in the whole periodic box vanishes. If this last condition is not satisfied the equation is not consistent with the periodic boundary conditions (for more details see: https://ammar-hakim.org/sj/je/je1/je1-periodic-poisson.html).

Since in my initial post the source term is only positive (a sphere of positive density embedded in vacuum) the mathematical problem was not consistent.

P-S: for those who need it, the code is correct. It is only needed, to subtract the mean density over the box to density_real.

steven
  • 41
  • 3