35

I have a block of memory with elements of fixed size, say 100 bytes, put into it one after another, all with the same fixed length, so memory looks like this

<element1(100 bytes)><element2(100 bytes)><element3(100 bytes)>...

In some situations I need to determine whether all bytes of a certain element are set to the 0-byte because that has a special meaning (I didn't say it was a good idea, but that is the situation I am in).

The question is, how do I do that efficiently. Further: is there a simple function to do it. For setting bytes to zero I can used memset or bzero, but I don't know of any function for checking for zero.

At the moment I am using a loop for the check

char *elementStart = memoryBlock + elementNr*fixedElementSize;
bool special = true;
for ( size_t curByteNr=0; curByteNr<fixedElementSize; ++curByteNr )
{
  special &= (*(elementStart+curByteNr)) == 0;
}

Of course, I could loop with a bigger offset and check several bytes at once with a mword or some other suited bigger type. And I guess that would be rather efficient, but I would like to know whether there is a function to take that burden from me.

Suggested functions:

  • !memcmp (compareBlock, myBlock, fixedElementSize)
Ansgar Lampe
  • 630
  • 1
  • 6
  • 10
  • related to http://stackoverflow.com/questions/1493936/faster-means-of-checking-for-an-empty-buffer-in-c – Jeff Atwood Aug 07 '11 at 09:39
  • Are most test cases (buffers) zero? Or are they rarely all zero? If the latter, depending on your target hardware (and how it deals with branch mispredicts), it may be more efficient to also add -special- to the for() termination condition so you're not reading the *entire* block of memory – kornman00 Nov 26 '13 at 07:03
  • http://stackoverflow.com/questions/2589736/fast-way-to-check-if-an-array-of-chars-is-zero – phuclv Dec 01 '14 at 04:54

10 Answers10

42

You could perhaps actually use memcmp without having to allocate a zero-valued array, like this:

static int memvcmp(void *memory, unsigned char val, unsigned int size)
{
    unsigned char *mm = (unsigned char*)memory;
    return (*mm == val) && memcmp(mm, mm + 1, size - 1) == 0;
}

The standard for memcmp does not say anything about overlapping memory regions.

mihaif
  • 492
  • 4
  • 4
  • 4
    Function call overhead and calling `memcmp` with multiple pointers offset by single byte remove any benefit from using `memcmp`, simple for loop would be hundret times better and more readable. – kwesolowski Jan 02 '16 at 10:15
  • 5
    This is brilliant. In practice it may be true that single-byte causes problems, as @kwesolowski 's alludes to. That *might* be resolved by first checking first, say, 7 bytes and then calling memcmp(mm, mm+8, size-8). Or maybe not-- I could believe some or most implementations of memcmp start by checking for overlap and, if so, use a slow simple method. – Don Hatch Dec 09 '16 at 10:58
  • 1
    @DonHatch if you assume shifting 8 bytes would get you a correctly aligned memory offset you already know how to do the check efficiently without calling memcmp – pqnet Oct 28 '18 at 16:53
  • @pqnet: **(a)** shifting by 8 bytes may not be aligned, but the point is that `mm` and `mm+8` are misaligned by the same amount (assuming an 8-byte block size), therefore `memcmp` would have the opportunity to do aligned fetches after the first few bytes. If youjshift by one byte only, on the other hand, you guarantee that at least one of the pointers will not be aligned. **(b)** there is a huge leap between knowing that the memory is aligned and writing a highly optimized memcmp for all platforms. That frequently involves manual use of platform specific vectorized instructions/intrinsics. – Yakov Galka Dec 27 '19 at 19:56
  • 1
    @DonHatch: actually `memcmp` doesn't care if there's an overlap or not, since it doesn't write to that memory. :) – Yakov Galka Dec 27 '19 at 20:03
  • 1
    @ybungalobill The reason I thought the implementation might care is that it might be based on some SIMD intrinsics or maybe a simple machine instruction, and thus it would have to care about the preconditions of such instructions. I'm not all that familiar with what instructions like that exist in the real world, but I wouldn't be astonished to find that one or more of them require non-overlapping inputs. – Don Hatch Dec 30 '19 at 18:56
  • this may not work if `size == 0` – Zhang Boyang Oct 03 '21 at 04:14
26

The obvious portable, high efficiency method is:

char testblock [fixedElementSize];
memset (testblock, 0, sizeof testblock);

if (!memcmp (testblock, memoryBlock + elementNr*fixedElementSize, fixedElementSize)
   // block is all zero
else  // a byte is non-zero

The library function memcmp() in most implementations will use the largest, most efficient unit size it can for the majority of comparisons.

For more efficiency, don't set testblock at runtime:

static const char testblock [100];

By definition, static variables are automatically initialized to zero unless there is an initializer.

wallyk
  • 56,922
  • 16
  • 83
  • 148
  • 1
    with memcmp we have another function to do the work, even with the overhead of creating a compare block. Thanks. – Ansgar Lampe Aug 04 '11 at 08:48
  • 2
    The overhead of creating a compare block is, as wallyk mentioned, offset by the fact that memcmp is very efficient as decent implementations compare memory in the byte order of the machine and the comparisons are block aligned (wouldn't be noticed except for "very large" amounts of memory). std::upper_bound does not have this property but is "more c++" as it is a proper template vector iterator. – charstar Aug 04 '11 at 09:34
  • 2
    As mentioned by Benjamin Lindley, std::upper doesn't even have the property of searching in a non-sorted range. So memcmp wins. – Ansgar Lampe Aug 04 '11 at 09:52
  • 2
    memcmp makes comparison `unsigned char` by `unsigned char` because its requirement for its return value is _The memcmp() function returns an integer greater than, equal to, or less than zero, accordingly as the object pointed to by s1 is greater than, equal to, or less than the object pointed to by s2_. A better solution would be to write a code optimized to use the biggest word size available and fills the gaps at the limits – lalebarde Jun 09 '18 at 10:15
  • 1
    @lalebarde, `memcmp` implementations normally compare word by word (probably 4-32 bytes at a time) until they find a word that differs. They may iterate bytewise at that point, but they don't have to, since you can get the same answer by comparing the final words as big-endian unsigned integers (with the help of `bswap` or `movbe` on x86/x64). – benrg Feb 03 '19 at 17:51
8

I can't believe no one posted this yet... a solution that actually looks like C++ and isn't UB for breaking aliasing rules:

#include <algorithm> // std::all_of
#include <cstddef>   // std::size_t

// You might only need this
bool
memory_is_all_zeroes(unsigned char const* const begin,
                     std::size_t          const bytes)
{
    return std::all_of( begin, begin + bytes,
                        [](unsigned char const byte) { return byte == 0; } );
}

// but here's this as a bonus
template<typename T_Element, std::size_t T_count>
bool
array_is_all_zeroes( T_Element const (& array)[T_count] )
{
    auto const begin = reinterpret_cast<unsigned char const*>(array);
    auto const bytes = T_count * sizeof(T_Element);

    return memory_is_all_zeroes(begin, bytes);
}

int
main()
{
    int const blah[1000]{0};

    return !array_is_all_zeroes(blah);
}

This might not satisfy some people's assumptions about efficiency (which are just that, assumptions, until profiled), but I think being valid and idiomatic code are much in its favour.

underscore_d
  • 6,309
  • 3
  • 38
  • 64
4

AFAIK there is no automatically function to check memory.

You could use | to speed up the for-loop, no need for "=="

char *elementStart = memoryBlock + elementNr*fixedElementSize;
char special = 0;
for ( size_t curByteNr=0; curByteNr<fixedElementSize; ++curByteNr )
{
  special |= (*(elementStart+curByteNr));
}

and also can you use long for even more speed

char *elementStart = memoryBlock + elementNr*fixedElementSize;
long special = 0;
for ( size_t curByteNr=0; curByteNr<fixedElementSize; curByteNr += sizeof(long) )
{
   special |= *(long*)(elementStart+curByteNr);
}

WARNING: the above code is not tested. Please test it first so that the sizeof and casting operator works

Lucian
  • 3,407
  • 2
  • 21
  • 18
3

I have tested some solutions proposed here and checked memcmp source code which is not optimized for the OP needs since it has an additional requirement to perform sorting, leading it to compare unsigned char one by one.

In the following, I propose an optimized function check_memory_zeroed which performs most of the check on the biggest aligned int available, making it portable, and I compare it with the other solutions proposed in this thread. Time measurement is performed and results printed.

It shows that the proposed solution is near twice better than wallyk's obvious portable high efficiency method and does not need to create an additional array, and six times better than char by char comparison or mihaif's shifted array which saves RAM compared to wallyk's one.

I have also tested my solution without aligning the words check_memory_zeroed_bigestint_not_aligned and surprisingly, it performs even better. If someone has an explanation, he is welcome.

Here is the code with functional and performance tests on a 1Gb table (the proposed optimized function is the fisrt one : check_memory_zeroed):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <inttypes.h>
#include <assert.h>
#include <time.h>

#define BIG_TAB_SIZE 1000000000

typedef intmax_t biggestint;

int check_memory_zeroed (void* ptr, size_t size)
{
    if (ptr == NULL) return -1;
    int bis = sizeof(biggestint);
    char* pc = (char*) ptr;
    biggestint* pbi0 = (biggestint*) pc;
    if ((size_t) pc % bis) /* is aligned ? */
        pbi0 = (biggestint*) (pc + (bis - ((size_t) pc % bis))); /* minimal pointer larger than ptr but aligned */
    assert ((size_t) pbi0 % bis == 0); /* check that pbi0 is aligned */
    for (char* p = pc; p < (char*) pbi0; p++)
        if(*p) return 0; /* check beginning of non aligned array */
    biggestint* pbi = pbi0;
    biggestint* pbiUpper = ((biggestint*) (pc + size)) - 1;
    for (;pbi <= pbiUpper; pbi++)
        if(*pbi) return 0; /* check with the biggest int available most of the array : its aligned part */
    for (char* p = (char*) pbi; p < pc + size; p++)
        if(*p) return 0; /* check end of non aligned array */
    return 1;
}

int check_memory_zeroed_bigestint_not_aligned (void* ptr, size_t size)
{
    if (ptr == NULL) return -1;
    biggestint* pbi = (biggestint*) ptr;
    biggestint* pbiUpper = ((biggestint*) (((char*) ptr) + size)) - 1;
    for (;pbi <= pbiUpper; pbi++)
        if(*pbi) return 0; /* check with the biggest int available most of the array, but without aligning it */
    for (char* p = (char*) pbi; p < ((char*) ptr) + size; p++)
        if(*p) return 0; /* check end of non aligned array */
    return 1;
}

int check_memory_zeroed_by_char (void* ptr, size_t size)
{
    if (ptr == NULL) return -1;
    for (char* p = (char*) ptr; p < ((char*) ptr) + size; p++)
        if(*p) return 0;
    return 1;
}

/* variant of wallyk solution */
int check_memory_zeroed_by_memcmp_and_testblock (void* ptr, size_t size)
{
    void* testblock = malloc(size);
    if (ptr == NULL || testblock == NULL) return -1;
    memset (testblock, 0, sizeof(testblock));
    int res = ! memcmp (testblock, ptr, size);
    free (testblock);
    return res;
}

/* variant of mihaif solution */
int check_memory_zeroed_by_memcmp_with_shifted_array (void* ptr, size_t size)
{
    if (ptr == NULL) return -1;
    char* pc = (char*) ptr;
    return (*pc) || memcmp(pc, pc + 1, size - 1);
}

int test() {
    /* check_memory_zeroed (void* ptr, size_t size) */
    char tab[16];
    for (int i = 0; i < 8; i++)
        for (int j = 0; j < 8; j++) {
            for (int k = 0; k < 16; k++) tab[k] = (k >= i && k < 16 - j) ? 0 : 100 + k;
            assert(check_memory_zeroed(tab + i, 16 - j - i));
            if (i > 0) assert(tab[i-1] == 100 + i - 1);
            if (j > 0) assert(tab[16 - j] == 100 + 16 - j);
            for (int k = i; k < 16 - j; k++) {
                tab[k] = 200+k;
                assert(check_memory_zeroed(tab + i, 16 - j - i) == 0);
                tab[k] = 0;
            }
        }
    char* bigtab = malloc(BIG_TAB_SIZE);
    clock_t t = clock();
    printf ("Comparison of different solutions execution time for checking an array has all its values null\n");
    assert(check_memory_zeroed(bigtab, BIG_TAB_SIZE) != -1);
    t = clock() - t;
    printf ("check_memory_zeroed optimized : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
    assert(check_memory_zeroed_bigestint_not_aligned(bigtab, BIG_TAB_SIZE) != -1);
    t = clock() - t;
    printf ("check_memory_zeroed_bigestint_not_aligned : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
    assert(check_memory_zeroed_by_char(bigtab, BIG_TAB_SIZE) != -1);
    t = clock() - t;
    printf ("check_memory_zeroed_by_char : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
    assert(check_memory_zeroed_by_memcmp_and_testblock(bigtab, BIG_TAB_SIZE) != -1);
    t = clock() - t;
    printf ("check_memory_zeroed_by_memcmp_and_testblock by wallyk : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
    assert(check_memory_zeroed_by_memcmp_with_shifted_array(bigtab, BIG_TAB_SIZE) != -1);
    t = clock() - t;
    printf ("check_memory_zeroed_by_memcmp_with_shifted_array by mihaif : %f seconds\n",((float)t)/CLOCKS_PER_SEC);
    free (bigtab);

    return 0;
}

int main(void) {
    printf("Size of intmax_t = %lu\n", sizeof(intmax_t));
    test();
    return 0;
}

And the results for comparison of different solutions execution time for checking an array has all its values null:

  • Size of intmax_t = 8
  • check_memory_zeroed optimized : 0.331238 seconds
  • check_memory_zeroed_bigestint_not_aligned : 0.260504 seconds
  • check_memory_zeroed_by_char : 1.958392 seconds
  • check_memory_zeroed_by_memcmp_and_testblock by wallyk : 0.503189 seconds
  • check_memory_zeroed_by_memcmp_with_shifted_array by mihaif : 2.012257 seconds
lalebarde
  • 1,684
  • 1
  • 21
  • 36
2

It is not possible to check all 100 bytes at the same time. So, you (or any utility functions) have to iterate through the data in any case. But, besides having a step size bigger than 1 byte, you could do some more optimizations: For example, you could break as soon as you find a non-zero value. Well, the time complexity would still be O(n), I know.

Flinsch
  • 4,296
  • 1
  • 20
  • 29
0

The following will iterate through the memory of a structure. Only disadvantage is that it does a bytewise check.

#include <iostream>

struct Data { int i; bool b; };

template<typename T>
bool IsAllZero(T const& data)
{
    auto pStart = reinterpret_cast<const char*>(&data);
    for (auto pData = pStart; pData < pStart+sizeof(T); ++pData)
    {
        if (*pData)
            return false;
    }

    return true;
}

int main()
{
    Data data1;// = {0}; // will most probably have some content
    Data data2 = {0}; // all zeroes

    std::cout << "data1: " << IsAllZero(data1) << "\ndata2: " << IsEmptyStruct(data2);

    return 0;
};
fmuecke
  • 8,708
  • 1
  • 20
  • 25
0

I can't recall a standard library function which could do this for you. If you are not sure this causes any performance issues I'd just use the loop, maybe replace char* with int* as already suggested.

If you do have to optimize you could unroll the loop:

bool allZeroes(char* buffer)
{
    int* p = (int*)buffer;   // you better make sure your block starts on int boundary
    int acc = *p;
    acc |= *++p;
    acc |= *++p;
    ...
    acc |= *++p;    // as many times as needed

    return acc == 0;
}

You may need to add special handling for the end of buffer if it's size is not a multiple of sizeof(int), but it could be more efficient to allocate a slightly larger block with some padding bytes set to 0.

If your blocks are large you could treat them as a sequence of smaller blocks and loop over them, using the code above for each small block.

I would be curious to know how this solution compares with std::upper_bound(begin,end,0) and memcmp.

EDIT

Did a quick check how a home-grown implementation compares with memcmp, used VS2010 for that.

In short:

1) in debug mode home-grown can be twice as fast as memcmp

2) in release with full optimization memcmp has an edge on the blocks which start with non-0s. As the length of the zero-filled preamble increases it starts losing, then somehow magically gets almost as fast as homegrown, about only 10% slower.

So depending on your data patterns and need/desire to optimize you could get some extra performance from rolling out your own method, but memcmp is a rather reasonable solution.

Will put the code and results on github in case you could use them.

MaximG
  • 104
  • 7
  • Thanks :) I decided to use memcmp since I mainly wanted a rather fast solution that is still readable in code. Thank you for your effords anyway. Would not have thought that manual loop unrolling still does any good on the modern compilers. – Ansgar Lampe Aug 06 '11 at 22:15
  • 1
    No problem. I think this has more to do with the compiler than with the processor, however, this optimization is for a specific case, memcmp has to be generic. Check out [related questions](http://stackoverflow.com/questions/1872156/how-can-i-know-where-the-segment-of-memory-is-all-zero) on the topic - there is an even more elegant solution which also happens to be faster according to my tests. – MaximG Aug 07 '11 at 20:46
  • 2
    another one that violates aliasing rules :( – underscore_d Oct 26 '17 at 20:20
-1

Well if you just want to decide whether a single element is all 0s you can create a 100byte element with all 1s. Now when you want to check whether an element is all 0s just binary AND (&) the content of the element and the element you created(all 1s). now if the result of binary AND is zero the element you checked had all 0s otherwise it was not all 0s

the creation of a 100 byte element with all 1s seems costly but if you have a large number of elements to check then its actually better

you can create the 100 byte element with all 1s as void *elem; elem=malloc(100); now set all bits to 1(use ~(elem&0))

lovesh
  • 5,235
  • 9
  • 62
  • 93
  • I am not sure how could compare with this. Even if I created the 100 byte element I would still have to check the result of the binary ANDed element (probably created by bytewise AND) for zero. Binary AND my element to the 1-element would return my element again so where is the gain? But you are right, I could create an all-0-element and do a memcmp. – Ansgar Lampe Aug 04 '11 at 08:45
  • 2
    This is word salad. Why on Earth would someone iterate their memory, `AND`ing it with 1s, when they can just iterate it and compare it to 0? Come to think of it, why do they need 100 bytes of the same thing, when they're all the same, so they could just use 1 byte? Which should then be 0, not all 1s, so that the code says what it means, not the opposite for no reason. The only conceivable rationale for this post is if you're an entrant to the IOCCC. What is a "100 byte element", anyway? And do you really think that `&` operator can achieve anything good on a `void*`? – underscore_d Oct 26 '17 at 20:22
-1

What about using long int and binary or operator.

unsigned long long int *start, *current, *end, value = 0;
// set start,end
for(current = start; current!=end; current++) {
value |= *current;
}
bool AllZeros = !value;
kravemir
  • 10,636
  • 17
  • 64
  • 111
  • 1
    because using a `long int*` to read memory containing anything except `long int`s would violate aliasing rules, making this a non-solution – underscore_d Oct 26 '17 at 19:47