5

For example, I want to create a square root table using array SQRT[i] to optimize a game, but I don't know if there is a performance difference between the following initialization when accessing the value of SQRT[i]:

  1. Hardcode array

    int SQRT[]={0,1,1,1,2,2,2,2,2,3,3,.......255,255,255}
    
  2. Generate value at run time

    int SQRT[65536];
    int main(){
        for(int i=0;i<65536;i++){
            SQRT[i]=sqrt(i);
        }
        //other code
        return 0;
    }
    

Some example of accessing them:

    if(SQRT[a*a+b*b]>something)
    ...

I'm not clear if the program stores or accesses a hard-code array in a different way, also I don't know if the compiler would optimize the hard-code array to speed up the access time, is there performance differences between them when accessing the array?

ggrr
  • 7,737
  • 5
  • 31
  • 53
  • 3
    Access time will be same, but initialization/filling of array will be drastically different. – Nawaz Aug 13 '15 at 03:42
  • 2
    This is really going to be very compiler and implementation dependant. If you are interested in optimisations performed by a specific compiler of architecture please add this – Vality Aug 13 '15 at 03:42
  • 3
    Aside: consider making it an array of `uint8_t` instead, so that the total size of the array is smaller. Also consider avoiding square roots; usually `a*a + b*b > something*something` is better (although in this case, that's not an identical test to what you've written). –  Aug 13 '15 at 03:47
  • Yes, I'm completely agree with @Nawaz. The access time would be same but since the second code has used **sqrt()** which will run 65536 time, may cost much time then first code. – surajs1n Aug 13 '15 at 06:16
  • You are on your own, but without a code generator the risk of having one digit error on 65536 values is **high** - and IMHO not allowable. And only think of maintenance problem if you later need to change the rounding algorithm... It can be considered for 256 values or less but I would not imagine typing 65536 value! – Serge Ballesta Aug 13 '15 at 06:59
  • @SergeBallesta: obviously you'd generate the line of code with something like a shell or perl loop. The question is between compile-time hard-coded, or generating at run-time. (Preferably with a much more efficient method than `sqrt()` 2^16 times, see my answer for an example.) – Peter Cordes Aug 14 '15 at 01:25
  • @Vality: this is actually not platform dependent. Exceptions: you're talking about a platform where read-only data goes in ROM that's slower than RAM. Or you're talking about getting the compiler to know that the array doesn't alias with any other pointers, so it can be smarter about repeated use of the same element, and re-ordering reads of the array with writes through other pointers. – Peter Cordes Aug 14 '15 at 01:38

3 Answers3

4

First, you should do the hard-coded array right:

static const int SQRT[]={0,1,1,1,2,2,2,2,2,3,3,.......255,255,255};

(also using uint8_t instead of int is probably better, so as to reduce the size of the array and making it more cache-friendly)

Doing this has one big advantage over the alternative: the compiler can easily check that the contents of the array cannot be changed.

Without this, the compiler has to be paranoid -- every function call might change the contents of SQRT, and every pointer could potentially be pointing into SQRT and thus any write through an int* or char* might be modifying the array. If the compiler cannot prove this doesn't happen, then that puts limits on the kinds of optimizations it can do, which in some cases could show a performance impact.

Another potential advantage is the ability to resolve things involving constants at compile-time.

If needed, you may be able to help the compiler figure things out with clever use of __restrict__.

In modern C++ you can get the best of both worlds; it should be possible (and in a reasonable way) to write code that would run at compile time to initialize SQRT as a constexpr. That would best be asked a new question, though.

Nawaz
  • 353,942
  • 115
  • 666
  • 851
  • Why uint8_t instead of char? char is 1 byte. – Bobby Sacamano Aug 13 '15 at 16:08
  • 2
    @Bobby: ... and `char` could very well be an 8-bit *signed* type, meaning it has a maximum of `127`. That aside, it's a good habit to use the types that explicitly indicate your bit width requirements when you have them. –  Aug 13 '15 at 16:53
  • @BobbySacamano Also, since the elements of the array are numbers, not characters, it would be semantically incorrect. This is the same reason why C++17 introduced `std::byte`: it's time to stop using `char` for numbers and handling data bytes. – plasmacel Jun 06 '17 at 03:03
1

As people said in comments:

if(SQRT[a*a+b*b]>something)

is a horrible example use-case. If that's all you need SQRT for, just square something.

As long as you can tell the compiler that SQRT doesn't alias anything, then a run-time loop will make your executable smaller, and only add a tiny amount of CPU overhead during startup. Definitely use uint8_t, not int. Loading a 32bit temporary from an 8bit memory location is no slower than from a zero-padded 32b memory location. (The extra movsx instruction on x86, instead of using a memory operand, will more than pay for itself in reduced cache pollution. RISC machines typically don't allow memory operands anyway, so you always need an instruction to load the value into a register.)

Also, sqrt is 10-21 cycle latency on Sandybridge. If you don't need it often, the int->double, sqrt, double->int chain is not much worse than an L2 cache hit. And better than going to L3 or main memory. If you need a lot of sqrt, then sure, make a LUT. The throughput will be much better, even if you're bouncing around in the table and causing L1 misses.

You could optimize the initialization by squaring instead of sqrting, with something like

uint8_t sqrt_lookup[65536];
void init_sqrt (void)
{
    int idx = 0;
    for (int i=0 ; i < 256 ; i++) {
        // TODO: check that there isn't an off-by-one here
        int iplus1_sqr = (i+1)*(i+1);
        memset(sqrt_lookup+idx, i, iplus1_sqr-idx);
        idx = iplus1_sqr;
    }
}

You can still get the benefits of having sqrt_lookup being const (compiler knows it can't alias). Either use restrict, or lie to the compiler, so users of the table see a const array, but you actually write to it.

This might involve lying to the compiler, by having it declared extern const in most places, but not in the file that initializes it. You'd have to make sure this actually works, and doesn't create code referring to two different symbols. If you just cast away the const in the function that initializes it, you might get a segfault if the compiler placed it in rodata (or read-only bss memory if it's uninitialized, if that's possible on some platforms?)

Maybe we can avoid lying to the compiler, with:

uint8_t restrict private_sqrt_table[65536];  // not sure about this use of restrict, maybe that will do it?
const uint8_t *const sqrt_lookup = private_sqrt_table;

Actually, that's just a const pointer to const data, not a guarantee that what it's pointing to can't be changed by another reference.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
0

The access time will be the same. When you hardcode an array, the C library routines called before the main will initialize it (in an embedded system the start up code copies the Read write data, i.e. hardcoded from ROM to RAM address where the array is located, if the array is constant, then it is accessed directly from ROM).

If the for loop is used to initialize, then there is an overhead of calling the Sqrt function.

Wandering Fool
  • 2,170
  • 3
  • 18
  • 48
Prashanth
  • 11
  • 1
  • 1
    It is not true that a cpu can always access ROM directly for data access. Modern systems with nand flash will typically not use direct access! And also remember that there are systems like AVR which uses harvard architecture. In this cases content must be copied to ram before using it. Maybe complete at startup (common) or during access like in AVR architecure with gcc. Simply this feature is directly related to hardware, os and used compilers. – Klaus Aug 13 '15 at 10:01
  • static/global arrays aren't initialized by code at all. The literal data is in the executable, mapped into memory. – Peter Cordes Aug 14 '15 at 00:37