Numerical instability FFTW <> Matlab

Question

I am trying to numerically solve the Swift-Hohenberg equation http://en.wikipedia.org/wiki/Swift%E2%80%93Hohenberg_equation using a pseudo-spectral scheme, where the linear terms are treated implicitly in Fourier space, while the nonlinearity is evaluated in real space. A simple Euler scheme is used for the time integration.
My problem is that the Matlab code I have come up with works perfectly, while the C++ code, which relies on FFTW for the Fourier transforms, becomes unstable and diverges after a couple thousand time steps. I have tracked it down to the way the nonlinear term is treated (see the comments in the C++ code). If I use only the real part of Phi, the instability occurs. However, Phi should only have a negligible imaginary part due to numerical rounding errors, and Matlab is doing something similar, keeping Phi purely real. The Matlab code also runs fine under Octave. The initial condition can be something like
R=0.02*(rand(256,256)-0.5);
in Matlab (small amplitude fluctuations).

TLDR;

Why do these pieces of code do different things? Specifically, how can I make the C++ code work the same way the Matlab version does?

Edit 1:

For completeness, I have added the code using the R2C/C2R functions provided by FFTW. See http://fftw.org/fftw3_doc/Multi_002dDimensional-DFTs-of-Real-Data.html for details (I hope I got the data layout right). This code always shows the instability after about 3100 time steps. If I reduce dt to e.g. 0.01, it occurs 10 times later.

C++ code using complex DFTs

#include <iostream>
#include <fstream>
#include <cmath>
#include <fftw3.h>

int main() {

const int N=256, nSteps=10000;
const double k=2.0*M_PI/N, dt=0.1, eps=0.25;

double *Buf=(double*)fftw_malloc(N*N*sizeof(double));
double *D0=(double*)fftw_malloc(N*N*sizeof(double));

// complex arrays
fftw_complex *Phi=(fftw_complex*)fftw_malloc(N*N*sizeof(fftw_complex));
fftw_complex *PhiF=(fftw_complex*)fftw_malloc(N*N*sizeof(fftw_complex));
fftw_complex *NPhiF=(fftw_complex*)fftw_malloc(N*N*sizeof(fftw_complex));

// plans for Fourier transforms
fftw_plan phiPlan=fftw_plan_dft_2d(N, N, Phi, PhiF, FFTW_FORWARD, FFTW_ESTIMATE);
fftw_plan nPhiPlan=fftw_plan_dft_2d(N, N, NPhiF, NPhiF, FFTW_FORWARD, FFTW_ESTIMATE);
fftw_plan phiInvPlan=fftw_plan_dft_2d(N, N, Phi, Phi, FFTW_BACKWARD, FFTW_ESTIMATE);

std::ifstream fin("R.dat", std::ios::in | std::ios::binary); // read initial condition
fin.read(reinterpret_cast<char*>(Buf), N*N*sizeof(double));
fin.close();
for(int i=0; i<N*N; i++) {
    Phi[i][0]=Buf[i];   //initial condition
    Phi[i][1]=0.0;  //no imaginary part
}

fftw_execute(phiPlan);  //PhiF contains FT of initial condition

for(int j=0; j<N; j++) {
    for(int i=0; i<N; i++) {

        double kx=(i-(i/(N-N/2)*N))*k;
        double ky=(j-(j/(N-N/2)*N))*k;
        double k2=kx*kx+ky*ky;

        D0[j*N+i]=1.0/((1.0 - dt*(eps-1.0 + 2.0*k2 - k2*k2)));  // array of prefactors
    }
}   

const double norm=1.0/(N*N);

for(int n=0; n<=nSteps; n++) {

    if(n%100==0) {
        std::cout<<"n = "<<n<<'\n';
    }

    for(int j=0; j<N*N; j++) {
        // nonlinear term Phi^3
        //NPhiF[j][0]=Phi[j][0]*Phi[j][0]*Phi[j][0]; // unstable
        //NPhiF[j][1]=0.0;
            NPhiF[j][0]=Phi[j][0]*Phi[j][0]*Phi[j][0] - 3.0*Phi[j][0]*Phi[j][1]*Phi[j][1];
            NPhiF[j][1]=-Phi[j][1]*Phi[j][1]*Phi[j][1] + 3.0*Phi[j][0]*Phi[j][0]*Phi[j][1];
    }

    fftw_execute(nPhiPlan); // NPhiF contains FT of Phi^3

    for(int j=0; j<N*N; j++) {
        PhiF[j][0]=(PhiF[j][0] - dt*NPhiF[j][0])*D0[j]; // update
        PhiF[j][1]=(PhiF[j][1] - dt*NPhiF[j][1])*D0[j];

        Phi[j][0]=PhiF[j][0]*norm; // FFTW does not normalize
        Phi[j][1]=PhiF[j][1]*norm;
    }

    fftw_execute(phiInvPlan); // Phi contains the updated Phi in real space
}

for(int i=0; i<N*N; i++) {
    Buf[i]=Phi[i][0];   // saving the real part of Phi
}
std::ofstream fout("Phi.dat", std::ios::trunc | std::ios::binary);
fout.write(reinterpret_cast<char*>(Buf), N*N*sizeof(double));
fout.close();

for(int i=0; i<N*N; i++) {
    Buf[i]=Phi[i][1];   // saving the imag part of Phi
}
fout.open("PhiImag.dat", std::ios::trunc | std::ios::binary);
fout.write(reinterpret_cast<char*>(Buf), N*N*sizeof(double));
fout.close();


fftw_free(D0);
fftw_free(Buf);

fftw_free(Phi);
fftw_free(PhiF);
fftw_free(NPhiF);

fftw_destroy_plan(phiPlan);
fftw_destroy_plan(phiInvPlan);
fftw_destroy_plan(nPhiPlan);

return EXIT_SUCCESS;
}

C++ code using R2C/C2R

#include <iostream>
#include <fstream>
#include <cmath>
#include <fftw3.h>

int main() {

const int N=256, nSteps=3100;
const int w=N/2+1;
const double k=2.0*M_PI/N, dt=0.1, eps=0.25;

double *Buf=(double*)fftw_malloc(N*N*sizeof(double));
double *D0=(double*)fftw_malloc(N*w*sizeof(double));

fftw_complex *Phi=(fftw_complex*)fftw_malloc(N*w*sizeof(fftw_complex));
fftw_complex *PhiF=(fftw_complex*)fftw_malloc(N*w*sizeof(fftw_complex));
fftw_complex *NPhi=(fftw_complex*)fftw_malloc(N*w*sizeof(fftw_complex));

fftw_plan phiPlan=fftw_plan_dft_r2c_2d(N, N, (double*)PhiF, PhiF, FFTW_ESTIMATE);
fftw_plan nPhiPlan=fftw_plan_dft_r2c_2d(N, N, (double*)NPhi, NPhi, FFTW_ESTIMATE);
fftw_plan phiInvPlan=fftw_plan_dft_c2r_2d(N, N, Phi, (double*)Phi, FFTW_ESTIMATE);

std::ifstream fin("R.dat", std::ios::in | std::ios::binary);
fin.read(reinterpret_cast<char*>(Buf), N*N*sizeof(double));
fin.close();
for(int j=0; j<N; j++) {
    for(int i=0; i<N; i++) {
        ((double*)PhiF)[j*2*w+i]=Buf[j*N+i];
        ((double*)Phi)[j*2*w+i]=Buf[j*N+i];
    }
}

fftw_execute(phiPlan); //PhiF contains FT of IC

for(int j=0; j<N; j++) {
    for(int i=0; i<w; i++) {

        double kx=(i-(i/(N-N/2)*N))*k;
        double ky=(j-(j/(N-N/2)*N))*k;
        double k2=kx*kx+ky*ky;

        D0[j*w+i]=1.0/(1.0 - dt*(eps-1.0 + 2.0*k2 - k2*k2));
    }
}

const double norm=1.0/(N*N);

//begin first Euler step
for(int n=0; n<=nSteps; n++) {

    if(n%100==0) {
        std::cout<<"n = "<<n<<'\n';
    }

    for(int j=0; j<N; j++) {
        for(int i=0; i<N; i++) {
            ((double*)NPhi)[j*2*w+i]=((double*)Phi)[j*2*w+i]  *((double*)Phi)[j*2*w+i]  * ((double*)Phi)[j*2*w+i];
        }
    }

    fftw_execute(nPhiPlan); // NPhi contains FT of Phi^3

    for(int j=0; j<N*w; j++) {
        PhiF[j][0]=(PhiF[j][0] - dt*NPhi[j][0])*D0[j];
        PhiF[j][1]=(PhiF[j][1] - dt*NPhi[j][1])*D0[j];
    }

    for(int j=0; j<N*w; j++) {
        Phi[j][0]=PhiF[j][0]*norm;
        Phi[j][1]=PhiF[j][1]*norm;
    }

    fftw_execute(phiInvPlan);

}

for(int j=0; j<N; j++) {
    for(int i=0; i<N; i++) {
        Buf[j*N+i]=((double*)Phi)[j*2*w+i];
    }
}

std::ofstream fout("Phi.dat", std::ios::trunc | std::ios::binary);
fout.write(reinterpret_cast<char*>(Buf), N*N*sizeof(double));
fout.close();

fftw_destroy_plan(phiPlan);
fftw_destroy_plan(phiInvPlan);
fftw_destroy_plan(nPhiPlan);

fftw_free(D0);
fftw_free(Buf);
fftw_free(Phi);
fftw_free(PhiF);
fftw_free(NPhi);
}

Matlab code

function Phi=SwiHoEuler(Phi, nSteps)
epsi=0.25;
dt=0.1;

[nR nC]=size(Phi);
if mod(nR, 2)==0
    kR=[0:nR/2-1 -nR/2:-1]*2*pi/nR;
else
    kR=[0:nR/2 -floor(nR/2):-1]*2*pi/nR;
end
Ky=repmat(kR.', 1, nC);

if mod(nC, 2)==0
    kC=[0:nC/2-1 -nC/2:-1]*2*pi/nC;
else
    kC=[0:nC/2 -floor(nC/2):-1]*2*pi/nC;
end
Kx=repmat(kC, nR, 1); % frequencies
K2=Kx.^2+Ky.^2; % used for Laplacian in Fourier space
D0=1.0./(1.0-dt*(epsi-1.0+2.0*K2-K2.*K2)); % linear factors combined

PhiF=fft2(Phi);

for n=0:nSteps
    NPhiF=fft2(Phi.^3); % nonlinear term, evaluated in real space
    if mod(n, 100)==0
        fprintf('n = %i\n', n);
    end
    PhiF=(PhiF - dt*NPhiF).*D0; % update

    Phi=ifft2(PhiF); % inverse transform
end
return

MATLAB uses FFTW for FFT. That mean, that you might looking at wrong direction. — Luka Rahne, Jul 27 '11 at 20:16
Are you still debugging this? I'm interested if you have found a solution. I'm surprised matlab gave you a different answer than c++, considering the backend just uses fftw. — Marm0t, Aug 01 '11 at 12:41
I'm still on it, but it's on low priority at the moment... I have learned that Octave works much like the C++ code using complex DFTs, while Matlab does something which causes the instability to be avoided. — wurstl, Aug 03 '11 at 09:56
No, I never solved it, just avoided the issue. IIRC, I even contacted one of the authors of FFTW about it... Good luck with your problem! — wurstl, Nov 29 '18 at 20:44
@wurstl Oops, by mistake I deleted my previous comment! Anyhow, I finally solved my problem, and I came exactly to your same conclusion! Neglecting the imaginary part coming from the rounding errors of the IFT was causing the instability, so `ftw_plan_dft_c2r_2d` cannot be used in this case. Here is my answer, if you are interested: https://stackoverflow.com/questions/53518451/why-is-fft-of-ab-different-from-ffta-fftb — Tropilio, Dec 10 '18 at 23:28
@wurstl Hey, I opened an issue about this on the github page of FFTW3: [https://github.com/FFTW/fftw3/issues/159](https://github.com/FFTW/fftw3/issues/159). If you want to join the discussion and share your thoughts about this problem, you're welcome. — Tropilio, Dec 14 '18 at 11:01
After reading the attached thread, I don't think FFTW or its authors are to blame here. The issue persists even in c2c mode (where it can be mitigated, as you pointed out), and also if you use a different library entirely. Of course, the appeal of using r2c/c2r is that it would be the most efficient way to implement a pseudo-spectral scheme, if it wasn't for the instability. I think some careful numerical analysis is required to explain why almost-but-not-quite zero values prevent the algorithm from blowing up. — wurstl, Dec 20 '18 at 23:22
Without the nonlinear term, the Fourier transform turns a PDE into an ODE for each individual mode. However, the nonlinearity introduces a coupling between modes, since it can be written as a sort of convolution in Fourier space. Maybe this is related to the numerical problems. — wurstl, Dec 20 '18 at 23:31

score 1 · Answer 1 · edited Jan 15 '17 at 18:16

1

Look at these lines:

for ...
  double kx=(i-(i/(N-N/2)*N))*k;
  double ky=(j-(j/(N-N/2)*N))*k;
  double k2=kx*kx+ky*ky;
...

I have to admit that I did not look into the algos but "i/(N-N/2)" consists of integers and I suspect that your kx, ky, and k2 are calculated as expected. You may try something like this which avoids potential integer rounding errors:

for ...
  double kx=( double(i) -( double(i)/(0.5*double(N*N)))*k; // where in our case: (N-N/2)*N) = 0.5*N*N
  ...
...

edited Jan 15 '17 at 18:16

Scott Stensland

26,870
12
93
104

answered Jan 23 '12 at 14:42

boto

455
2
13

I guess the integer arithmetic is indeed the problem here, +1. Note that simply writing `2.0` in the denominator has the same effect; to me, that seems more readable. Also, In C++, we have `static_cast(i)`, with a bunch of advantages over `double(i)`. – Christopher Creutzig Jan 27 '12 at 12:41
I just checked again... The C++ and the Matlab code yield exactly the same frequencies kx and ky. The C++ part is just me trying to be clever. There might also be a better choice for the frequencies with respect to aliasing, but the main problem is insensitive to this issue. – wurstl Feb 07 '12 at 16:28

Marm0t · Answer 2 · 2011-07-27T20:16:39.067

0

EDIT The below is not correct OP had it right.

A pointer to the real part is stored in [0], the imaginary stored in [1] (i.e. NPhi[1][j] is what you should be referencing - at least according to Their site). So that is probably one problem. The fixed lines (I believe) should be the following:

NPhiF[0][j]=Phi[0][j]*Phi[0][j]*Phi[0][j] - 3.0*Phi[0][j]*Phi[1][j]*Phi[1][j];
NPhiF[1][j]=-Phi[1][j]*Phi[1][j]*Phi[1][j] + 3.0*Phi[0][j]*Phi[0][j]*Phi[1][j];

That will fix one part - you will have to fix the rest.

edited Jul 27 '11 at 20:16

answered Jul 27 '11 at 19:34

Marm0t

921
9
18

1

No, the use of fftw_complex is correct as it is. NPhiF etc. are arrays of fftw_complex, and fftw_complex is an array of two doubles. The two lines you quoted nevertheless contain the problem: I have to treat Phi as if it was a complex number, when it should be purely real, except for roundoff error. – wurstl Jul 27 '11 at 19:48
Ah - you are correct about the the data array. If you include before you include fftw3.h you can use regular multiplication for your power function. Have you tried using the fftw_plan_dft_r2c_2d plan and then taking the inverse with fftw_plan_dft_c2r_2d? – Marm0t Jul 27 '11 at 20:16
Actually, I have... Using the R2C/C2R type functions is of course more efficient when dealing with real data. However, I found this to be unstable, therefore I went back to the usual complex DFTs to find out what's going on. I could share the R2C code as well, if it's of interest... – wurstl Jul 28 '11 at 10:02