0

i'm looking for an algorithm to check if all the bytes in a stream or buffer are equal. exist any algorithm to test that?

I know wich this problem can be solved by walking across the stream and comparing each element with the first element, but I am looking for a better solution. the stream can have thousand of elements.

Salvador
  • 16,132
  • 33
  • 143
  • 245
  • 4
    The natural algorithm is to compare each byte with the first byte. Two lines of code. How simpler or faster can it get? – Nikita Rybak Oct 06 '10 at 04:57
  • dupe of http://stackoverflow.com/questions/3869526/how-to-determine-if-all-characters-in-a-string-are-equal? – rafl Oct 06 '10 at 04:59
  • 1
    If you are looking to compare A to B, Nikita nailed it. If you are looking to compare A to B, C, and D, etc. then message hashes will help reduce the number of direct byte-by byte comparisons you need to make. – msw Oct 06 '10 at 05:00
  • @rafl the another question is specific to delphi, this question is to get ideas and implement myself the algorithm in delphi. – Salvador Oct 06 '10 at 05:02

2 Answers2

2

Every byte must be visited and checked so the opportunities for optimistation seem to be limited. I can think of two possibilities:

Do you know anything about the likelyhood of variability? For example is there a reason to suppose that differences are more likely at one end of the buffer or the other. You could examine some sample data input statistically and see whether there's any benefit in starting the comparison at one end or the other.

Another possibility: can you work in ints or longs? In C you could play pointer tricks to treat 4 adjacent bytes as an int, then do int comparisons rather than byte comparisons. It's not obvious that this must be quicker than 4 x as many byte comparisons, but just possbly it might be.

This is one of the few occasions where even a touch of hand-assembling might yield some benefits.

djna
  • 54,992
  • 14
  • 74
  • 117
0

The best you can do is to do a linear comparison. However, you can use some architecture-specific tricks to greatly speed up the comparison by comparing word by word instead of byte by byte. The naive but portable approach might look like this:

bool all_bytes_equal(const char* buffer, size_t len) {
    char c = buffer[0];
    for(size_t i = 1; i < len; ++i)
        if(buffer[i] != c)
            return false;
    return true;
}

But if your architecture supports rotated shifts, you could easily speed this up by running the comparisons 4 bytes at a time on a 32-bit architecture, or 8 bytes at a time on a 64-bit architecture:

// For x86-64 architectures with ROTL/ROTR instruction support
bool all_bytes_equal(const char* buffer, size_t len) {
    const uint64_t* word_buf = (const uint64_t*) buffer;
    for(size_t i = 0; i < len / 8; ++i)
        if(word_buf[i] != _rotl64(word_buf[i], 8))
            return false;
    return true;
}

If your architecture supports SSE/AVX, you could speed this up even further by vectorizing the rotate shifts (below assumes AVX512, but you can replace __m512 with __m256 or __m128, and replace the _mm512_* family of functions with _mm256_* or _mm_*, and replace the loop increment accordingly):

bool all_bytes_equal(const char* buffer, size_t len) {
    for(size_t i = 0; i < len / 512; i += 512) {
        __m512 word = _mm512_load_epi64(buffer + i);
        if(_mm512_testn_epi64_mask(word, _mm512_rol_epi64(word, 8)))
            return false;
    }
    return true;
}

The bottom two examples make a few assumptions about the alignment of the buffer, but you can easily add code to them to deal with unaligned bytes at the beginning/end of the buffer.