You can handle multiple bytes at a time and unroll the loop:
int dataisnull(const void *data, size_t length) {
/* assuming data was returned by malloc, thus is properly aligned */
size_t i = 0, n = length / sizeof(size_t);
const size_t *pw = data;
const unsigned char *pb = data;
size_t val;
#define UNROLL_FACTOR 8
#if UNROLL_FACTOR == 8
size_t n1 = n - n % UNROLL_FACTOR;
for (; i < n1; i += UNROLL_FACTOR) {
val = pw[i + 0] | pw[i + 1] | pw[i + 2] | pw[i + 3] |
pw[i + 4] | pw[i + 5] | pw[i + 6] | pw[i + 7];
if (val)
return 0;
}
#endif
val = 0;
for (; i < n; i++) {
val |= pw[i];
}
for (i = n * sizeof(size_t); i < length; i++) {
val |= pb[i];
}
return val == 0;
}
Depending on your specific problem, it might be more efficient to detect non zero values early or late:
- If the all zero case is the most common, you should compute cumulate all bits into the
val
accumulator and test only at the end.
- If the all zero case is rare, you should check for non zero values more often.
The unrolled version above is a compromise that tests for non zero values every 64 or 128 bytes depending on the size of size_t
.
Depending on your compiler and processor, you might get better performance by unrolling less or more. You could also use intrinsic functions available for your particular architecture to take advantage of vector types, but it would be less portable.
Note that the code does not verify proper alignment for the data
pointer:
- it cannot be done portably.
- it assumes the data was allocated via
malloc
or similar, hence properly aligned for any type.
As always, benchmark different solutions to see if it makes a real difference. This function might not be a bottleneck at all, writing a complex function to optimize a rare case is counterproductive, it makes the code less readable, more likely to contain bugs and much less maintainable. For example, the assumption on data alignment may not hold if you change memory allocation scheme or if you use static arrays, the function may invoke undefined behavior then.