Fast tricks to determine array elements are non-negative in C?

Question

I am writing a function

int are_non_negatives(int* start, int n) {
  ...
}

This function returns 1 if all the next n integers in the array start are non-negative. It returns 0 otherwise.

My question is if there exist tricks that do this as fast as possible (other than looping and checking every single position)?

Dirty/non-portable tricks are fine. I want to know about those as well. Thanks!

Loop and check. There's no magic bullet here. If these are sorted then you can always just test the lowest value is > 0, or >= 0 depending on your definition of "non-negative". — tadman, Feb 10 '21 at 22:56
Does "consecutive" mean the numbers are in order with no gaps, or do you really mean "all given integers"? — tadman, Feb 10 '21 at 22:58
What `start` is pointing to? An array? Some special structure to this array? — Eugene Sh., Feb 10 '21 at 22:58
How about recursion with "hidden" checks? Not that it is better for performance or something: `int are_non_negatives(int* start, int n) { return (n == 0) || ((start[n-1] > 0) && are_non_negatives(start, n - 1)); }` — Eugene Sh., Feb 10 '21 at 23:09
Can you give some usage examples and explain the parameters better? — tadman, Feb 10 '21 at 23:10
Anyway, what is the nature of the "tricks" being asked for? To improve what criteria? — Eugene Sh., Feb 10 '21 at 23:14
On its face it sounds like you can just `return start[0] >= 0;`, since if the first number is nonnegative then all the rest must be too, but since that seems too easy I assume I am misunderstanding the setup. — Nate Eldredge, Feb 10 '21 at 23:16
@EugeneSh. Ideally maybe there are some ways to do it concurrently or use as few instructions as possible. — apen, Feb 10 '21 at 23:17
No, C does not have native concurrency, but you definitely can use something like MPI to parallelize it. — Eugene Sh., Feb 10 '21 at 23:18
I wonder if it is an XY-Problem. How this function is going to be used? — Eugene Sh., Feb 10 '21 at 23:25
You can OR the `int` elements and test at the end whether the sign bit is on. (This will count −0 as negative in sign-and-magnitude implementations.) Sequences of ORs may be faster than comparisons in some processors. You can also parallelize by unrolling the loop. Then you can look to extensions such as SIMD. — Eric Postpischil, Feb 10 '21 at 23:26
@EricPostpischil ORing them will eliminate the possibility for the loop to terminate early - making it less performant in an average case. — Eugene Sh., Feb 10 '21 at 23:28
@apen I don't see what is gain even if you parallelize. If you use MPI you need to send first parts of array to other processes. — Lazar Đorđević, Feb 10 '21 at 23:31
The function is used in a loop that goes through a large array of integers. In the array, `n` consecutive non-negative elements represent a special pattern, and the job is to detect those patterns. `n` depends on the position in the array and extra data; that is to say, generally you cannot predict `n`. — apen, Feb 10 '21 at 23:31
@EugeneSh.: You cannot know what the average is. No probability distribution is specified. — Eric Postpischil, Feb 10 '21 at 23:31
@EricPostpischil Indeed. I pointed out the possible downside of this approach the OP should be aware of — Eugene Sh., Feb 10 '21 at 23:33
@apen How this array is constructed? Can't you add some meta-data at a time of the construction? — Eugene Sh., Feb 10 '21 at 23:35
@EugeneSh. I have thought about metadata, but I am afraid maintaining the metadata comes at a great cost since the array mutates very often in the larger context. — apen, Feb 10 '21 at 23:40
What about just a counter of the number of negative elements in the array? And whenever you add or remove an element, you check if the element was negative and increment/decrement the counter accordingly. You really think that would be too expensive? — Nate Eldredge, Feb 10 '21 at 23:52
@NateEldredge The thing is `start` is not necessarily the beginning of the array, and `n` is generally far smaller than the length of the array. So I don't think the solution you propose applies. — apen, Feb 10 '21 at 23:55
Do you want it optimize for the worst case or for the average case? In a case of average performance, please specify the distribution non-negative elements. This distribution is crucial for selection of the optimal algorithm — tstanisl, Feb 11 '21 at 15:24
@tstanisl You can assume `n` is pretty small (1 <= n <= 10). There is no other solid assumption that can be made for `n`. So I am not sure speaking of distribution makes a lot of sense here. — apen, Feb 11 '21 at 19:08
"Fast tricks" in the context of programming to me means "I want to write hard-to-understand-and-maintain 'clever' code because I think I can outsmart my optimizing compiler that is built on literally decades of effort from multiple experts." If you haven't profiled your application and found that ***this code*** is an actual performance bottleneck, you're most likely going to slow down something that doesn't matter anyway - unless your "clever" code makes things super slow. Don't spend time doing this unless you have actual evidence from actual profiling and testing that it's necessary. — Andrew Henle, Feb 11 '21 at 21:25

Pascal Getreuer · Answer 1 · 2021-02-10T23:52:37.990

7

One slightly "dirty/non-portable" trick you could leverage for the worst case where all elements need to be checked: In 2's complement int representation, the highest bit is set if and only if the value is negative. So, you could bitwise OR them all and check the highest bit. This could be done using vector instructions to do batches of them at a time, e.g. 8 elements at a time supposing 32-bit int and 256-bit AVX instructions.

edited Feb 10 '21 at 23:52

answered Feb 10 '21 at 23:30

Pascal Getreuer

2,906
1
5
14

Great answer. A (simplified) code snippet would be nice. – apen Feb 11 '21 at 18:46

score 4 · Accepted Answer · answered Feb 11 '21 at 10:07

Trying to improve performance is a recurring subject for C programmers. Benchmarking alternatives is necessary to test if optimization is useful and worth the effort. This is a naive implementation for reference:

int are_non_negatives_naive(const int *start, int n) {
    while (n --> 0) {
        if (*start++ >= 0)
            return 0;
    }
    return 1;
}

On current architectures (two's complement) you can combine blocks of entries and test the sign bit much less often. If the arrays are small, you can combine all elements and use a single test:

int are_non_negatives_full(const int *start, int n) {
    int combined = 0;
    while (n --> 0) {
        combined |= *start++;
    }
    return combined >= 0;
}

If the arrays sizes are varied, you can combine all elements in chunks and test every chunk, allowing for early bail out if a negative value is present:

int are_non_negatives_chunks(const int *start, int n) {
    int combined;
    for (; n >= 8; n -= 8, start += 8) {
        combined = (start[0] | start[1] | start[2] | start[3] |
                    start[4] | start[5] | start[6] | start[7]);
        if (combined < 0)
            return 0;
    }
    combined = 0;
    while (n --> 0) {
        combined |= *start++;
    }
    return combined >= 0;
}

Further performance improvements can be achieved by vectorizing the above, either using portable constructions such using 64-bit types or more specific hardware specific intrinsics. The effort might not be necessary is the compiler can vectorize the above algorithms on its own: clang uses SIMD instructions as can be seen with Godbolt Compiler Explorer.

Jack Lilhammers · Answer 3 · 2021-02-13T12:30:51.213

You could check k elements at a time by casting to the largest integer available and crafting the right mask, eg: if you have an array of char, or int8_t, you could check 8 elements at a time.
The mask is crafted to be all zeroes, except for the sign bits.
This way if one of those chars is negative the bitwise AND of the mask and the cast iterator will be > 0

int8_t _i8[8] = { INT8_MIN, INT8_MIN, INT8_MIN, INT8_MIN, INT8_MIN, INT8_MIN, INT8_MIN, INT8_MIN };
uint64_t mask = *(uint64_t *)_i8;

uint64_t *it = (uint64_t *)a;
size_t i = 0;

if (it[i] & mask)
    return FALSE;

Of course you should take care of the remainders of n % k, which can be done with a simple switch like this:

size_t i = 0;

switch (n % 8) {
    case 0: if (a[i++] < 0) return FALSE;
    case 7: if (a[i++] < 0) return FALSE;
    case 6: if (a[i++] < 0) return FALSE;
    case 5: if (a[i++] < 0) return FALSE;
    case 4: if (a[i++] < 0) return FALSE;
    case 3: if (a[i++] < 0) return FALSE;
    case 2: if (a[i++] < 0) return FALSE;
    case 1: if (a[i++] < 0) return FALSE;
}

Here I've made a proof of concept that compares the naive approach, the unrolled approach using the Duff's device, and an unrolled and masked approach that uses both the Duff's device and the masking.
It would require a bit more testing, but seems promising nonetheless.

You can run it here https://onlinegdb.com/SyliIUrSZO

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <limits.h>
#include <math.h>
#include <time.h>

#define CLOCKS_PER_MSEC (CLOCKS_PER_SEC/1000)

#define TRUE (0 == 0)
#define FALSE (!TRUE)

// 8 bit integers
int are_non_negatives_naive_i8(int8_t *a, size_t n)
{
    size_t i;

    for (i = 0; i < n; ++i)
        if (a[i] < 0)
            return FALSE;

    return TRUE;
}
int are_non_negatives_unrolled_i8(int8_t *a, size_t n)
{
    size_t M = (n + 7) / 8;
    size_t i = 0;

    switch (n % 8) {
        case 0: do { if (a[i++] < 0) return FALSE;
        case 7:      if (a[i++] < 0) return FALSE;
        case 6:      if (a[i++] < 0) return FALSE;
        case 5:      if (a[i++] < 0) return FALSE;
        case 4:      if (a[i++] < 0) return FALSE;
        case 3:      if (a[i++] < 0) return FALSE;
        case 2:      if (a[i++] < 0) return FALSE;
        case 1:      if (a[i++] < 0) return FALSE;
                    } while (--M > 0);
    }
}
int are_non_negatives_unrolled_masked_i8(int8_t *a, size_t n)
{
    int8_t _i8[8] = { INT8_MIN, INT8_MIN, INT8_MIN, INT8_MIN, INT8_MIN, INT8_MIN, INT8_MIN, INT8_MIN };
    uint64_t mask = *(uint64_t *)_i8;

    uint64_t *it = (uint64_t *)a;

    size_t i = 0;

    switch (n % 8) {
        case 0: if (a[i++] < 0) return FALSE;
        case 7: if (a[i++] < 0) return FALSE;
        case 6: if (a[i++] < 0) return FALSE;
        case 5: if (a[i++] < 0) return FALSE;
        case 4: if (a[i++] < 0) return FALSE;
        case 3: if (a[i++] < 0) return FALSE;
        case 2: if (a[i++] < 0) return FALSE;
        case 1: if (a[i++] < 0) return FALSE;
    }

    size_t N = n / 8;
    // for (i = 0; i < N; ++i)
    //     if (it[i] & mask)
    //         return FALSE;

    size_t M = (N + 7) / 8;
    switch (N % 8) {
        case 0: do { if (it[i++] & mask) return FALSE;
        case 7:      if (it[i++] & mask) return FALSE;
        case 6:      if (it[i++] & mask) return FALSE;
        case 5:      if (it[i++] & mask) return FALSE;
        case 4:      if (it[i++] & mask) return FALSE;
        case 3:      if (it[i++] & mask) return FALSE;
        case 2:      if (it[i++] & mask) return FALSE;
        case 1:      if (it[i++] & mask) return FALSE;
                    } while (--M > 0);
    }

    return TRUE;
}
//----------------------------------------------------------------

// 16 bit integers 
int are_non_negatives_naive_i16(int16_t *a, size_t n)
{
    size_t i;

    for (i = 0; i < n; ++i)
        if (a[i] < 0)
            return FALSE;

    return TRUE;
}
int are_non_negatives_unrolled_i16(int16_t *a, size_t n)
{
    size_t M = (n + 7) / 8;
    size_t i = 0;

    switch (n % 8) {
        case 0: do { if (a[i++] < 0) return FALSE;
        case 7:      if (a[i++] < 0) return FALSE;
        case 6:      if (a[i++] < 0) return FALSE;
        case 5:      if (a[i++] < 0) return FALSE;
        case 4:      if (a[i++] < 0) return FALSE;
        case 3:      if (a[i++] < 0) return FALSE;
        case 2:      if (a[i++] < 0) return FALSE;
        case 1:      if (a[i++] < 0) return FALSE;
                    } while (--M > 0);
    }
}
int are_non_negatives_unrolled_masked_i16(int16_t *a, size_t n)
{
    int16_t _i16[4] = { INT16_MIN, INT16_MIN, INT16_MIN, INT16_MIN };
    uint64_t mask = *(uint64_t *)_i16;

    uint64_t *it = (uint64_t *)a;

    size_t i = 0;

    switch (n % 8) {
        case 0: if (a[i++] < 0) return FALSE;
        case 7: if (a[i++] < 0) return FALSE;
        case 6: if (a[i++] < 0) return FALSE;
        case 5: if (a[i++] < 0) return FALSE;
        case 4: if (a[i++] < 0) return FALSE;
        case 3: if (a[i++] < 0) return FALSE;
        case 2: if (a[i++] < 0) return FALSE;
        case 1: if (a[i++] < 0) return FALSE;
    }

    size_t N = n / 4;
    // for (i = 0; i < N; ++i)
    //     if (it[i] & mask)
    //         return FALSE;

    size_t M = (N + 7) / 8;
    switch (N % 8) {
        case 0: do { if (it[i++] & mask) return FALSE;
        case 7:      if (it[i++] & mask) return FALSE;
        case 6:      if (it[i++] & mask) return FALSE;
        case 5:      if (it[i++] & mask) return FALSE;
        case 4:      if (it[i++] & mask) return FALSE;
        case 3:      if (it[i++] & mask) return FALSE;
        case 2:      if (it[i++] & mask) return FALSE;
        case 1:      if (it[i++] & mask) return FALSE;
                    } while (--M > 0);
    }

    return TRUE;
}
//----------------------------------------------------------------

// 32 bit integers 
int are_non_negatives_naive_i32(int32_t *a, size_t n)
{
    size_t i;

    for (i = 0; i < n; ++i)
        if (a[i] < 0)
            return FALSE;

    return TRUE;
}
int are_non_negatives_unrolled_i32(int32_t *a, size_t n)
{
    size_t M = (n + 7) / 8;
    size_t i = 0;

    switch (n % 8) {
        case 0: do { if (a[i++] < 0) return FALSE;
        case 7:      if (a[i++] < 0) return FALSE;
        case 6:      if (a[i++] < 0) return FALSE;
        case 5:      if (a[i++] < 0) return FALSE;
        case 4:      if (a[i++] < 0) return FALSE;
        case 3:      if (a[i++] < 0) return FALSE;
        case 2:      if (a[i++] < 0) return FALSE;
        case 1:      if (a[i++] < 0) return FALSE;
                    } while (--M > 0);
    }
}
int are_non_negatives_unrolled_masked_i32(int32_t *a, size_t n)
{
    int32_t _i32[2] = { INT32_MIN, INT32_MIN };
    uint64_t mask = *(uint64_t *)_i32;

    uint64_t *it = (uint64_t *)a;

    size_t i = 0;

    switch (n % 8) {
        case 0: if (a[i++] < 0) return FALSE;
        case 7: if (a[i++] < 0) return FALSE;
        case 6: if (a[i++] < 0) return FALSE;
        case 5: if (a[i++] < 0) return FALSE;
        case 4: if (a[i++] < 0) return FALSE;
        case 3: if (a[i++] < 0) return FALSE;
        case 2: if (a[i++] < 0) return FALSE;
        case 1: if (a[i++] < 0) return FALSE;
    }

    size_t N = n / 2;
    // for (i = 0; i < N; ++i)
    //     if (it[i] & mask)
    //         return FALSE;

    size_t M = (N + 7) / 8;
    switch (N % 8) {
        case 0: do { if (it[i++] & mask) return FALSE;
        case 7:      if (it[i++] & mask) return FALSE;
        case 6:      if (it[i++] & mask) return FALSE;
        case 5:      if (it[i++] & mask) return FALSE;
        case 4:      if (it[i++] & mask) return FALSE;
        case 3:      if (it[i++] & mask) return FALSE;
        case 2:      if (it[i++] & mask) return FALSE;
        case 1:      if (it[i++] & mask) return FALSE;
                    } while (--M > 0);
    }

    return TRUE;
}
//----------------------------------------------------------------

void fill_array(void *array, size_t type, size_t n, int contains_negatives)
{
    size_t i;

    switch (type) {
        case 1: {
            int8_t *a = array;

            for (i = 0; i < n; ++i)
                a[i] = rand() % INT8_MAX;
            break;
        }
        case 2: {
            int16_t *a = array;

            for (i = 0; i < n; ++i)
                a[i] = rand() % INT16_MAX;
            break;
        }
        case 4: {
            int32_t *a = array;

            for (i = 0; i < n; ++i)
                a[i] = rand() % INT32_MAX;
            break;
        }
        case 8: {
            int64_t *a = array;

            for (i = 0; i < n; ++i)
                a[i] = rand();
            break;
        }
    }

    if (contains_negatives) {
        int how_many = rand() % n;
        
        for (int i; i < how_many; ++i) {
            int j = rand() % n;

            switch (type) {
                case 1: {
                    int8_t *a = array;
                    a[j] *= -1;
                    break;
                }
                case 2: {
                    int16_t *a = array;
                    a[j] *= -1;
                    break;
                }
                case 4: {
                    int32_t *a = array;
                    a[j] *= -1;
                    break;
                }
                case 8: {
                    int64_t *a = array;
                    a[j] *= -1;
                    break;
                }
            }
        }
    }
}

int main()
{
    time_t t;

    clock_t clock_start;
    clock_t clock_end;
    clock_t clock_naive = 0;
    clock_t clock_unrolled = 0;
    clock_t clock_unrolled_masked = 0;

    /* Intializes random number generator */
    srand((unsigned long) time(&t));

    printf("8 bit integers\n");
    printf("----------------------------------------------------------------\n");
    int8_t *array_i8;

    for (int i = 3; i < 7; ++i) {
        size_t size = pow(10, i);
        array_i8 = malloc(size * sizeof(int8_t));

        printf("------------------------------------------------\n");
        printf("%lu elements\n", size);
        for (int has_neg = 0; has_neg < 2; ++has_neg) {
            printf("--------------------------------\n");
            printf("Has negatives: %s\n", has_neg ? "yes": "no");
            fill_array(array_i8, sizeof(int8_t), size, has_neg);

            int neg_naive;
            int neg_unrolled;
            int neg_unrolled_masked;

            clock_start = clock();
            neg_naive = are_non_negatives_naive_i8(array_i8, size);
            clock_end = clock();
            clock_naive = clock_end - clock_start;

            clock_start = clock();
            neg_unrolled = are_non_negatives_unrolled_i8(array_i8, size);
            clock_end = clock();
            clock_unrolled = clock_end - clock_start;

            clock_start = clock();
            neg_unrolled_masked = are_non_negatives_unrolled_masked_i8(array_i8, size);
            clock_end = clock();
            clock_unrolled_masked = clock_end - clock_start;
            
            if (!has_neg) {
                if (!neg_naive)
                    printf("Error: neg_naive\n");
                if (!neg_unrolled)
                    printf("Error: neg_unrolled\n");
                if (!neg_unrolled_masked)
                    printf("Error: neg_unrolled_masked\n");
            }

            printf("          clock_naive: %lu\n", clock_naive);
            printf("       clock_unrolled: %lu\n", clock_unrolled);
            printf("clock_unrolled_masked: %lu\n", clock_unrolled_masked);
            printf("--------------------------------\n");
        }
        printf("------------------------------------------------\n");

        free (array_i8);
        array_i8 = NULL;
    }
    printf("----------------------------------------------------------------\n");

    printf("16 bit integers\n");
    printf("----------------------------------------------------------------\n");
    int16_t *array_i16;

    for (int i = 3; i < 7; ++i) {
        size_t size = pow(10, i);
        array_i16 = malloc(size * sizeof(int16_t));

        printf("------------------------------------------------\n");
        printf("%lu elements\n", size);
        for (int has_neg = 0; has_neg < 2; ++has_neg) {
            printf("--------------------------------\n");
            printf("Has negatives: %s\n", has_neg ? "yes": "no");
            fill_array(array_i16, sizeof(int16_t), size, has_neg);

            int neg_naive;
            int neg_unrolled;
            int neg_unrolled_masked;

            clock_start = clock();
            neg_naive = are_non_negatives_naive_i16(array_i16, size);
            clock_end = clock();
            clock_naive = clock_end - clock_start;

            clock_start = clock();
            neg_unrolled = are_non_negatives_unrolled_i16(array_i16, size);
            clock_end = clock();
            clock_unrolled = clock_end - clock_start;

            clock_start = clock();
            neg_unrolled_masked = are_non_negatives_unrolled_masked_i16(array_i16, size);
            clock_end = clock();
            clock_unrolled_masked = clock_end - clock_start;
            
            if (!has_neg) {
                if (!neg_naive)
                    printf("Error: neg_naive\n");
                if (!neg_unrolled)
                    printf("Error: neg_unrolled\n");
                if (!neg_unrolled_masked)
                    printf("Error: neg_unrolled_masked\n");
            }

            printf("          clock_naive: %lu\n", clock_naive);
            printf("       clock_unrolled: %lu\n", clock_unrolled);
            printf("clock_unrolled_masked: %lu\n", clock_unrolled_masked);
            printf("--------------------------------\n");
        }
        printf("------------------------------------------------\n");

        free (array_i16);
        array_i16 = NULL;
    }
    printf("----------------------------------------------------------------\n");

    printf("32 bit integers\n");
    printf("----------------------------------------------------------------\n");
    int32_t *array_i32;

    for (int i = 3; i < 7; ++i) {
        size_t size = pow(10, i);
        array_i32 = malloc(size * sizeof(int32_t));

        printf("------------------------------------------------\n");
        printf("%lu elements\n", size);
        for (int has_neg = 0; has_neg < 2; ++has_neg) {
            printf("--------------------------------\n");
            printf("Has negatives: %s\n", has_neg ? "yes": "no");
            fill_array(array_i32, sizeof(int32_t), size, has_neg);

            int neg_naive;
            int neg_unrolled;
            int neg_unrolled_masked;

            clock_start = clock();
            neg_naive = are_non_negatives_naive_i32(array_i32, size);
            clock_end = clock();
            clock_naive = clock_end - clock_start;

            clock_start = clock();
            neg_unrolled = are_non_negatives_unrolled_i32(array_i32, size);
            clock_end = clock();
            clock_unrolled = clock_end - clock_start;

            clock_start = clock();
            neg_unrolled_masked = are_non_negatives_unrolled_masked_i32(array_i32, size);
            clock_end = clock();
            clock_unrolled_masked = clock_end - clock_start;
            
            if (!has_neg) {
                if (!neg_naive)
                    printf("Error: neg_naive\n");
                if (!neg_unrolled)
                    printf("Error: neg_unrolled\n");
                if (!neg_unrolled_masked)
                    printf("Error: neg_unrolled_masked\n");
            }

            printf("          clock_naive: %lu\n", clock_naive);
            printf("       clock_unrolled: %lu\n", clock_unrolled);
            printf("clock_unrolled_masked: %lu\n", clock_unrolled_masked);
            printf("--------------------------------\n");
        }
        printf("------------------------------------------------\n");

        free (array_i32);
        array_i32 = NULL;
    }
    printf("----------------------------------------------------------------\n");
}

Any suggestions or fixes are more than welcome

Note that this solution is not standard and depends on the alignment of the data types.
Albeit it'll usually work, you should check it with something like this

#include <stdio.h>
#include <stdint.h>
#include <stdalign.h>

int main()
{
    printf(
        "alignof(int8_t ): %lu\n"
        "alignof(int16_t): %lu\n"
        "alignof(int32_t): %lu\n"
        "alignof(int64_t): %lu\n",
        alignof(int8_t),
        alignof(int16_t),
        alignof(int32_t),
        alignof(int64_t)
    );

    return 0;
}

Both `*(uint64_t *)_i8` and `uint64_t *it = (uint64_t *)a;` are [strict aliasing violations](https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule) and therefore undefined behavior. They also risk undefined behavior by violating [C11 6.3.2.3p7](https://port70.net/~nsz/c/c11/n1570.html#6.3.2.3p7): "A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined." — Andrew Henle, Feb 11 '21 at 21:14

score 0 · Answer 4 · answered Feb 10 '21 at 23:34

I suspect that this won’t work for you based on the context of your question, but in case your code allows for you to capture array operations (initialization, insertion, modification, deletion), you can coherently maintain a separate list of all elements with negative values. If you have the freedom to do that, it would just be a space-speed tradeoff.

Fast tricks to determine array elements are non-negative in C?

4 Answers4