I am using armadillo library. The part of my program that is too slow and I need to speed it up is the following
for(int q(0); q < Nk*Nk; q++){
for(int k(0); k < Nk*Nk; k++){
int kq = (k+q) % (Nk*Nk);
cx_mat Gx = ((Eigveck.slice(k)).t())*(Vxk.slice(k)-Vxk.slice(kq))*Eigveck.slice(kq);
cx_mat Gy = ((Eigveck.slice(k)).t())*(Vyk.slice(k)-Vyk.slice(kq))*Eigveck.slice(kq);
vec ek = Eigvalk.col(k);
vec ekq = Eigvalk.col(kq);
for(int i(0); i < Ltot; i++){
for(int j(0); j < Ltot; j++){
chi = chi + (abs(Gx(i,j))*abs(Gx(i,j))+abs(Gy(i,j))*abs(Gy(i,j)))*(1.0/(1.0+exp(ekq(j)/T))-1.0/(1.0+exp(ek(i)/T)))*((ekq(j)-ek(i))/((ekq(j)-ek(i))*(ekq(j)-ek(i))+eta*eta))/(Nk*Nk);
}
}
}
double qx = (G1(0)*floor(q/Nk)/Nk+G2(0)*(q % Nk)/Nk);
double qy = (G1(1)*floor(q/Nk)/Nk+G2(1)*(q % Nk)/Nk);
lindhard << qx << " " << qy << " " << -chi << " " << endl;
}
Before this part, I define a huge matrix, Eigvalk and huge cubes, Eigveck, Vxk, Vyk.
Now, the call to their values is extremely slow in the for loop, it takes ages. The cubes contain eigenvectors and other quantities of a given problem. The thing is that for Nk=10 (very small Nk to test the code), it takes 0.1 seconds to compute Nk*Nk=100 times 47 eigenvectors and it takes 4.5 seconds to perform the loop shown that uses them. I have checked that the part that costs time is the call
cx_mat Gx = .....
I have also tried to define vector or huge cx_mat (by vectorising matrices) instead of cx_cube, but nothing changes.
Is there a better way solve this?