Is there any benefit of not using double on a 64bit (and using, say, float instead) processor?

Question

I always use double to do calculations but double offers far better accuracy than I need (or makes sense, considering that most of the calculations I do are approximations to begin with).

But since the processor is already 64bit, I do not expect that using a type with less bits will be of any benefit.

Am I right/wrong, how would I optimize for speed (I understand that smaller types would be more memory efficient)

here is the test

#include <cmath>
#include <ctime>
#include <cstdio>

template<typename T>
void creatematrix(int m,int n, T **&M){
    M = new T*[m];
    T *M_data = new T[m*n];

    for(int i=0; i< m; ++i) 
    {
        M[i] = M_data + i * n;
    }
}

void main(){
    clock_t start,end;
    double diffs;
    const int N = 4096;
    const int rep =8;

    float **m1,**m2;
    creatematrix(N,N,m1);creatematrix(N,N,m2);

    start=clock();
    for(int k = 0;k<rep;k++){
        for(int i = 0;i<N;i++){
            for(int j =0;j<N;j++)
                m1[i][j]=sqrt(m1[i][j]*m2[i][j]+0.1586);
        }
    }
    end = clock();
    diffs = (end - start)/(double)CLOCKS_PER_SEC;
    printf("time = %lf\n",diffs);


    delete[] m1[0];
    delete[] m1;

    delete[] m2[0];
    delete[] m2;

    getchar();
}

there was no time difference between double and float, however when square root is not used, float is twice as fast.

Operations like `/` and `sqrt` are typically much slower for `double` than for `float` - check the throughput/latency for the operations that you are interested in on your target CPU. — Paul R, Sep 01 '15 at 17:50
It matters when you use a lot of them, cache utilization is better. Vectorized code could perform better as well. Tinker with the math model only when you know what you're doing, it is very rarely an arbitrary choice. — Hans Passant, Sep 01 '15 at 18:00
@PaulR I updated my post and did a test. Double and float had almost no time difference. — Gappy Hilmore, Sep 04 '15 at 16:19
I just found out that the float version of sqrt is sqrtf, and this caused a 20% decrease in time — Gappy Hilmore, Sep 04 '15 at 16:24
If you put a division operation in the loop you may see a bigger difference, e.g. change `+0.1586` to `/0.1586` (or `/0.1586f` for the `float` version). — Paul R, Sep 04 '15 at 16:26

Simon Byrne · Accepted Answer · 2016-01-22T16:49:32.337

There are a couple of ways they can be faster:

Faster I/O: you have only half the bits to move between disk/memory/cache/registers
Typically the only operations that are slower are square-root and division. As an example, on a Haswell a DIVSS (float division) takes 7 clock cycles, whereas a DIVSD (double division) takes 8-14 (source: Agner Fog's tables).
If you can take advantage of SIMD instructions, then you can handle twice as many per instruction (i.e. in a 128-bit SSE register, you can operate on 4 floats, but only 2 doubles).
Special functions (log, sin) can use lower-degree polynomials: e.g. the openlibm implementation of log uses a degree 7 polynomial, whereas logf only needs degree 4.
If you need higher intermediate precision, you can simply promote float to double, whereas for a double you need either software double-double, or slower long double.

Note that these points also hold for 32-bit architectures as well: unlike integers, there's nothing particularly special about having the size of the format match your architecture, i.e. on most machines doubles are just as "native" as floats.

but only for large arrays (for example for 128x128 matrices, double vs float had no difference but for 8192x8192 matrices float was twice as fast) — Gappy Hilmore, Sep 04 '15 at 09:29
Larg matrix operations are the ideal case to exploit points (i) and (iii) above (lots of I/O, easy to use SIMD). — Simon Byrne, Sep 05 '15 at 10:56

Is there any benefit of not using double on a 64bit (and using, say, float instead) processor?

1 Answers1

Linked