0

I am trying to optimise some code which runs unreasonably slowly for what is required. The top answer here describes the method I am trying (although I am not 100% sure I am implementing it correctly).

Only a few lines show up repeatedly on the top of the call stack as I pause the program randomly, however I do not know how I could increase the codes performance given these lines.

The essential function of the code is updating a lattice of points repeatedly using the values of the points surrounding a given point. The relevant code for the first line that comes up:

The class definition:

template<typename T> class lattice{
    private:
        const unsigned int N; //size
        std::vector<std::vector<T>> lattice_points =
            std::vector<std::vector<T>>(N,std::vector<T>(N)); //array of points
    protected:
        static double mod(double, double) ;
    public:
        lattice(unsigned int);
        lattice(const lattice&);
        lattice& operator=(const lattice&);
        ~lattice() {};

        T point(int, int) const;
        void set(int, int, T);
        unsigned int size() const;
};

These lines show up quite often:

template <typename T>    
T lattice<T>::point(int x, int y) const {    
    return (*this).lattice_points[x % N][y % N]; //mod for periodic boundaries    
};    

template <typename T>    
void lattice<T>::set(int x, int y, T val) {    
    this->lattice_points[x % N][y % N] = val; //mod for periodic boundaries    
};  

They are used here:

angle_lattice update_lattice(const angle_lattice& lat, const parameters& par, double dt) {                    
    std::random_device rd;                                                                                    
    std::mt19937 gen(rd());                                                                                   
    std::uniform_real_distribution<> dis(-0.5,0.5);                                                           

    double sqrtdt = sqrt(dt);                                                                                 
    angle_lattice new_lat(lat.size());                                                                        
    int N = lat.size();                                                                                       
    for(int i=0; i < N; i++) {                                                                                
        for(int j=0; j < N; j++) {                                                                            
            double val = lat.point(i,j)+                                                                      
                      dt*(-par.Dx*( sin_points(lat, i, j, i+1, j) + sin_points(lat, i, j, i-1, j) )           
                         -par.Dy*( sin_points(lat, i, j, i, j+1) + sin_points(lat, i, j, i, j-1) )            
                         -par.Lx/2*( cos_points(lat, i, j, i+1, j) + cos_points(lat, i, j, i-1, j) -2)        
                         -par.Ly/2*( cos_points(lat, i, j, i, j+1) + cos_points(lat, i, j, i, j-1) -2))          
                         +sqrtdt*2*M_PI*par.Cl*dis(gen);                                                      
            new_lat.set(i,j,val);                                                                             
        }                                                                                                     
    }                                                                                                         
    return new_lat;                                                                                           
};                                                                                                            

double sin_points(const angle_lattice& lat, int i1, int j1, int i2, int j2) {                                 
    return sin(lat.point(i1, j1) - lat.point(i2, j2));                                                        
};                                                                                                            

double cos_points(const angle_lattice& lat, int i1, int j1, int i2, int j2) {                                 
    return cos(lat.point(i1, j1) - lat.point(i2, j2));                                                        
};

here angle_lattice is just a lattice where the template parameter is a angle. The set function is overloaded so that the angle is mod 2pi. The only other two functions that appear in the call stack are cos_points and sin_points , as well as generating the random number, but I assume the latter cannot be helped.

Is there anything that can be done? Help would be appreciated.

Edit: I changed the code following some of the suggestions and now the cosine and sine calculation are the highest. I am not sure what

user110503
  • 115
  • 4
  • 1
    OT: that `sin_points(lat, i, j, i-1, j)` might acess the vectos out of bounds for i==0. –  Dec 12 '17 at 11:25
  • 1
    Looks complicated somehow... Just some tips: `sqrt()`, `sin()`, and `cos()` are functions computed based on polynoms. They are not known for speed. Hence, a lot of sites can be found which describes replacements (mostly with better speed paying with less accuracy). A general trick: instead of `sqrt()`-ing something for comparison, multiply the compared with itself. Multiplication is much cheaper than `sqrt()` and the result is usually the same (if nothing overflows). – Scheff's Cat Dec 12 '17 at 11:26
  • 4
    You did enables the optimizer? Instead of useing vector of vector use one with size N*N. –  Dec 12 '17 at 11:26
  • @manni66 Hmm yes. I used the modulus differently before. i+1 is fine but i-1 is not. I will change it back. – user110503 Dec 12 '17 at 11:32
  • 1
    Since C++11 it is guaranteed that division rounds toward zero, before it was implementation defined. If division rounds towards zero, -1 % N == -1 for N > 1. http://coliru.stacked-crooked.com/a/16f1f1a034f792bb – PaulR Dec 12 '17 at 11:32
  • 3
    I don't see update latice called anywhere. Still, use a faster rng like PCG. Also, do not store points as vectors of vectors. Use one flat vector. Make N constant and a power of two. – starmole Dec 12 '17 at 11:32
  • @starmole Sorry that is a different part of the code. I did not know that vector of vectors would be that much of a problem. I will change it. Would using arrays make any difference? – user110503 Dec 12 '17 at 11:38
  • @manni66 Sorry I did not. What do you mean? – user110503 Dec 12 '17 at 11:39
  • 1
    arrays vs vector will make no difference, if the vectors are allocated once, and set up, since there they work effectively same as array. If you keep inserting/deleting elements, or introducing copying vectors (without exploiting move strategy), then array may be better, because they will rather force you to fix those operations, than to have all the hassle. But overall I would start on algorithm level first, it's not clear what your code is doing (without reading through it properly). Some summary, what algorithm you are using, and what should be result from what input? – Ped7g Dec 12 '17 at 11:43
  • 1
    And it looks like the lattice is NxN, can you use for N only powers of two? (like 32, 256, 1024, etc...) that would make the modulus of x/y much quicker. If not, you may reconsider doing even branch, if you know your coordinate may fell off the range only by +-N at most (i.e. `xx = x; if (xx < 0) xx += N; else if (N <= xx) xx -= N;` may be better than modulus for arbitrary `N` ... but it will be much slower than modulus with power of two) – Ped7g Dec 12 '17 at 11:47
  • @Ped7g I can stick to powers of 2 yes. However I would still have to add N to the result if it is negative as x % N will be negative for negative x. – user110503 Dec 12 '17 at 11:52
  • well, technically `mod 2^i` for indices can be done by `& (2^i-1)` ... i.e. `x mod 32 == x&31` ... negative values work that way too, no branching needed. – Ped7g Dec 12 '17 at 12:19
  • 1
    If `N` is truly `const` you could use `std::array` which should be faster than `std::vector`. Otherwise a single *vector* rather than a *vector of vectors* should give some performance benefits. – Galik Dec 12 '17 at 12:52

0 Answers0