Optimize small 3d vector struct for performance

Question

I am fairly new to C++ and have the following, let's call it, issue. In my computer simulation, I am working quite a lot with vectors. I built myself a small struct that represents a vector, and would like to learn how to make the operations (like for example the normalize() function more efficient. Also, is there any benefit in using C++'s valarray? It seems to have some useful methods pre-implemented.

I almost exclusively use the normalize() function and addition/substraction/multiplication of vectors. Since all my vectors have only three elements, I am hesitant of including 3rd party libraries in my project.

This is my struct:

struct vector_t {
    int _i, _j, _k;
    vector_t(int i, int j, int k) {
        _i = i;
        _j = j;
        _k = k;
    }
    vector_t() {}
    inline int getI() { 
        return _i; 
    }
    inline int getJ() { 
        return _j; 
    }
    inline int getK() { 
        return _k; 
    }
    inline void setI(int val) { 
        _i = val; 
    }
    inline void setJ(int val) { 
        _j = val; 
    }
    inline void setK(int val) { 
        _k = val; 
    }
    void normalize() {
        float length = sqrt(_i*_i + _k*_k + _j*_j);
        _i /= length;
        _j /= length;
        _k /= length;
    }
};

And my questions:

How can I speed up the normalize() function or is this already the most efficient way?
What would be a more C++-onic way to implement such a struct/class while keeping memory and computer time usages low?
Should I prefer a valarray over my own type?

Why do you think third party libraries are unsuitable for objects with three members? ;) And do you know that your current implementation is *not fast enough*? — jalf, Oct 11 '13 at 09:35
normalize is going to be "pretty bad" if your members are all ints. — Michael Anderson, Oct 11 '13 at 09:37
Use a fast reciprocal square root function to calculate `1 / length` and then multiply each element by this factor. As well as being a faster function than `sqrt` this also trades 3 expensive division operations for 3 relatively cheap multiplies. — Paul R, Oct 11 '13 at 09:39
Thank you for the comments. For portability and such a small usage scope, I prefer to implement the stuff on my own. And utilizing blas/lapac stuff seems to be somewhat heavy weight for my application. @MichaelAnderson: Could you elaborate that (with a link perhaps)? — janoliver, Oct 11 '13 at 09:39
@janoliver simple, let's say your vector is 1,1,1. Normalizing will make it 0.33,0.33,0.33 (since normalize means length = 1). As you default to int your vector is then converted to a zero vector. — mrvux, Oct 11 '13 at 09:41
catflier: Oh, of course. I actually use floats instead of ints, but changed it for the post here without thinking. Sorry for that. @PaulR: Thank you for the tipp! — janoliver, Oct 11 '13 at 09:43
@janoliver: no problem - I've expanded the above comment into an answer now, including a link to a suitable `rsqrt` implementation. — Paul R, Oct 11 '13 at 09:47
Outside of pure programming, turning compiler options can also be important (O2,Ot), align your struct to 16 bytes will also help if you want to use SSE. for sse you can either normalize per vector or write a routine that normalizes 4 vectors at a time. — mrvux, Oct 11 '13 at 10:03
If you want real speed, I'd recommend Eigen: http://eigen.tuxfamily.org/index.php?title=Main_Page There should be any extra dependencies on that. — dornhege, Oct 11 '13 at 10:12
If you are worried about the absolute optimal way to do numeric computations, you should be reading the CPU manufacturer's manual and write carefully crafted, intrinsics-based (or just pure assembly) code based on that. Or use [a library where the authors did just that](http://eigen.tuxfamily.org/index.php?title=Main_Page). — DanielKO, Oct 11 '13 at 10:22

score 5 · Accepted Answer · answered Oct 11 '13 at 09:46

5

Use a fast reciprocal square root function to calculate 1 / length and then multiply each element by this factor. As well as being a faster function than sqrt this also trades 3 expensive division operations for 3 relatively cheap multiplies:

struct vector_t {
    float _i, _j, _k;

    // ...

    void normalize() {
        float r_length = Q_rsqrt(_i*_i + _k*_k + _j*_j);
        _i *= r_length;
        _j *= r_length;
        _k *= r_length;
    }
};

Note: you might want to think about how you're going to handle the pathological case where i == j == k == 0.

answered Oct 11 '13 at 09:46

Paul R

208,748
37
389
560

Thank you, I changed my code and implemented this function: http://stackoverflow.com/questions/17789928/fast-inverse-square-root-algorithm-in-modern-c – janoliver Oct 11 '13 at 09:54
Yes, that's essentially the same routine that I linked to above. Note that if you can assume x86-only then there are even faster alternatives. – Paul R Oct 11 '13 at 10:02
It bugs me that people are still pushing this as a fast solution, when SSE completely crushes it. Yes it does better than x87 FPU but for performance you want SSE. http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/ – r_ahlskog Oct 11 '13 at 13:23
@r_ahlskog: this big difference is that the fast `rsqrt` is portable whereas SSE is x86-only. – Paul R Oct 11 '13 at 13:27
@PaulR, true, it requires SSE, on ARM you find NEON which offers VRSQRTE and VRSQRTS, PowerPC has AltiVec, which I am not familiar with but I imagine it has similar functionality. – r_ahlskog Oct 11 '13 at 14:01

score 2 · Answer 2 · answered Oct 11 '13 at 09:39

2

That won't make a big difference but your constructor should look like this :

vector_t(int i, int j, int k)
: _i(i), _j(j), _k(k)
{
}

For optimization you should look into SSE : http://www.cortstratton.org/articles/OptimizingForSSE.php

answered Oct 11 '13 at 09:39

Co_42

1,169
1
10
19

good point, thank you. Yet, I think the compiler is smart enough to eliminate any speed differences. – janoliver Oct 11 '13 at 09:41

Optimize small 3d vector struct for performance

2 Answers2