0

I'm trying to implement strlen using SIMD AVX2 intrinsics, but when calling _mm256_cmpeq_epi8, I sometimes get SIGSEGV 11 exception.

It works like 50% of the time. It's also called in a loop, but fails(if it does) only on the first iteration.

Here is the code:

size_t simd_strlen(const char *s) {
    unsigned int i = 0;
    const __m256i *p;
    __m256i mask, zero;

    p = (__m256i *) s;
    zero = _mm256_setzero_si256();

    while (true) {
        // mask will always contain all zeros, unless \0 appears (all bits 0) => cmpeq will return 0xFF for that byte


        mask = _mm256_cmpeq_epi8(*p, zero);

        // if mask is all zeros, then each bit AND with itself == 0 => return is 1,
        // only when there is at least one 1 in mask - return is 0, which means \0 occurred
        if (!_mm256_testz_si256(mask, mask)) {
            break;
        }
        ++i;
        ++p;
    }

    int count = i * 32;
    i = 0;
    char *p_2 = (char *) p;
    // add the rest
    while (p_2[++i]) {
    }
    return count + i;
}
  • `p = (__m256i *) s;` is almost certainly [a strict aliasing violation](https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule) along with being an alignment violation of [**6.3.2.3 Pointers**, paragraph 7](https://port70.net/~nsz/c/c11/n1570.html#6.3.2.3p7): "A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined." You can't safely take a `char` array and treat is as something else. – Andrew Henle May 20 '20 at 16:58
  • @AndrewHenle I thought about that, but I can't seem to find a way to convert char* to _m256i * – Nazar Pasternak May 20 '20 at 17:00
  • You could use an `unaligned load` I believe. – WBuck May 20 '20 at 17:02
  • Use `_mm256_loadu_si256` to get the unaligned input data into an `__m256i` and then use that for the comparison. – Paul R May 20 '20 at 17:02
  • Thank you, that worked. – Nazar Pasternak May 20 '20 at 17:13
  • 1
    @AndrewHenle: Intel intrinsics allow vector pointers to alias anything else, just like `char*` in ISO C. This is part of the set of behaviour that compilers have to define to fully support Intel intrinsics. GCC/clang do that by defining them as `__attribute__((vector_size(32), may_alias))`. [Is \`reinterpret\_cast\`ing between hardware SIMD vector pointer and the corresponding type an undefined behavior?](//stackoverflow.com/q/52112605). The problem here is that dereferencing a `__m256i*` implies an aligned load. With optimization disabled, it won't get folded into a memory src operand. – Peter Cordes May 20 '20 at 17:40

0 Answers0