1

I have these two definitions:

uint8_t *idx[0x100];
uint8_t  raw[0x1000];

Is there any other way than to loop over every element of idx to point them all to raw[0]?

for (i=0; i<sizeof(raw); i++)
    idx[i] = &raw[0];

There must be a faster way than ↑ that. Is there an equivalent to memset for pointers?

nebuch
  • 6,475
  • 4
  • 20
  • 39
  • 3
    There is no other way. – haccks Mar 05 '15 at 16:15
  • 4
    Also, you'll overrun the buffer this way - `sizeof()` gives you the size in *bytes* of the array, not the array count. You want `for ( i = 0; i < sizeof(idx) / sizeof(idx[0]); i++)` (note also that it's `sizeof(idx)`, not `sizeof(raw);` – Paul Roub Mar 05 '15 at 16:17
  • 2
    If you allocate `raw` statically, you can make the initialization at compile time. – Vorac Mar 05 '15 at 16:17
  • Related: http://stackoverflow.com/q/4066522/694576 – alk Mar 05 '15 at 16:27

3 Answers3

4

The simple, straightforward loop is probably the best way (note that there's an error in your current loop as others pointed out).

The advantage is that those kind of loops are very easy to optimize, it's such a common case that compilers have gotten very good at it, and your compiler will use vector instructions and other optimizations as needed to keep it very fast without needing to hand-optimize yourself. And of course at the same time it is more readable, more maintainable, than optimizing it by hand.

Of course if there's a special case, for example if you want to fill it with null pointers, or if you know what the content will be at compile time, then there are some slightly more efficient ways to do that, but in the general case making it easy for your compiler to optimize your code is the simplest way to get good performance.

tux3
  • 7,171
  • 6
  • 39
  • 51
1

We only see a fragment of code, if you are initializing a global array of pointers to point to a global array of uint8_t, there is a faster way: write an explicit initializer. The initalization is done at compile time and takes virtually no time at execution time.

If the array is automatic, I'm afraid there is no faster way to do this. If your compiler is clever and instructed to use optimizations (-O2, -O3, etc.) it will probably unroll the loop and generate pretty efficient code. Look at the assembly output to verify this. If it does not, you can unroll the loop yourself:

Assuming the array size is a multiple of 4:

for (i = 0; i < sizeof(idx) / sizeof(*idx); i += 4)
    idx[i] = idx[i+1] = idx[i+2] = idx[i+3] = &raw[0];

Note that you should be careful with the sizeof operator: in addition to using the wrong array for the size computation, your code makes 2 implicit assumptions:

  • The array element is a char
  • idx is an array, not a pointer to an array.

It is advisable to use sizeof(idx) / sizeof(*idx) to compute the number of elements of the array: this expression works for all array element types, but idx still needs to be an array type. Defining a macro:

#define countof(a)  (sizeof(a) / sizeof(*(a)))

Makes it more convenient, but hides the problem if a is a pointer.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • 1
    There might be even more efficient ways to do this, taking into account the behaviour of the CU cache: interleaving accesses to elements of `idx` in a very specific way may yield faster code. This is hypothetical, CPU specific and very cumbersome to write, leave such low level optimization to the compiler. – chqrlie Mar 05 '15 at 16:31
1

from performance engineering's perspective, there is indeed a way to make it faster than

for (i=0; i<sizeof(raw); i++)
    idx[i] = &raw[0];

if you make a comparison after turning off the optimizer in compiler. but the difference could be very minor.

let's do it:

uint8_t *idx[0x100];
uint8_t  raw[0x1000];

#define lengthof(arr)    (sizeof(arr) / sizeof(*arr))

uint8_t *start = idx;
int length = lengthof(idx);
uint8_t *end = idx + (length & ~1);
for (; start < end;)
{
    *start++ = raw;
    *start++ = raw;
}
if (length & 1)
    *start++ = raw;

this is faster majorly because of two reasons:

  1. direct operate on pointers. if you do idx[i], in assembly, (idx + i * sizeof *idx) will be perform each time, while *start has already had the answer in hand.
  2. duplicate operation in each iteration. in this way, the code will have less branching while maintaining the locality. gcc -O2 mostly likely will do the trick for you.
Jason Hu
  • 6,239
  • 1
  • 20
  • 41
  • If you really wanted to hand-optimize that thing to death, there's plently more work to do just with instrisics. First go with the vector instructions (128 bit MMX is a good start), use instructions to prefetch the next iteration's memory explicitly, decrement instead of incrementing (because `dec` sets the zero flag on various architectures), enforce alignment, etc etc – tux3 Mar 05 '15 at 16:40