-1

I need to count the number of spaces in a string this way.

There's code:

std::size_t simd256_count_of_spaces(std::string& text) noexcept
{
  std::size_t spaces = 0;

  for (std::uint64_t i = 0; i < text.length(); i += 32)
  {
    __m256i __32 =
    _mm256_set_epi8(
      text[i     ], text[i +  1],
      text[i +  2], text[i +  3],
      text[i +  4], text[i +  5],
      text[i +  6], text[i +  7],
      text[i +  8], text[i +  9],
      text[i + 10], text[i + 11],
      text[i + 12], text[i + 13],
      text[i + 14], text[i + 15],
      text[i + 16], text[i + 17],
      text[i + 18], text[i + 19],
      text[i + 20], text[i + 21],
      text[i + 22], text[i + 23],
      text[i + 24], text[i + 25],
      text[i + 26], text[i + 27],
      text[i + 28], text[i + 29],
      text[i + 30], text[i + 31]
    );

    __m256i __cmp_mask =
    _mm256_set_epi8(
      32, 32, 32, 32, 32, 32, 32, 32,
      32, 32, 32, 32, 32, 32, 32, 32,
      32, 32, 32, 32, 32, 32, 32, 32,
      32, 32, 32, 32, 32, 32, 32, 32
    );

    __m256i __cmp_result = _mm256_cmpeq_epi8(__32, __cmp_mask);

   // ...
  }

}

And i have vector at output like this:

255 255 0   0   0   0   255 0
255 0   0   255 255 0   255 0
255 255 255 0   0   255 255 0
255 255 0   0   0   0   255 0

But, after that, i can get count of count of 255's or 0's in this way:

    std::uint8_t* cmp_res = reinterpret_cast<std::uint8_t*>(&__cmp_result);
    for (int i = 0; i < 32; i++)
    {
      if (cmp_res[i] == 255) spaces++;
    }

Is it possible to do the same thing(get count of 255's or 0's), but without additional loops?

UPDATE

This code solved my problem:

std::size_t spaces = 0;

const __m256i
__cmp = _mm256_set1_epi8(32);

__m256i __eq = _mm256_cmpeq_epi8(__32, __cmp);

spaces += _popcnt32(_mm256_movemask_epi8(__eq));
385i
  • 25
  • 1
  • 1
  • 5
  • Does this answer your question? [How to count character occurrences using SIMD](https://stackoverflow.com/questions/54541129/how-to-count-character-occurrences-using-simd) – harold Jan 11 '21 at 13:45
  • By the way, the way you loaded `text` into a vector is an anti-pattern, use the actual vector loads for that (eg `_mm256_loadu_si256`) – harold Jan 11 '21 at 13:47

1 Answers1

-1

Just use for loop, compiler will optimize it for you

exoze
  • 39
  • 1
  • 1
  • Prove it, I tried it (using the loop shown in question) and [it didn't happen](https://godbolt.org/z/5vcd6j) – harold Jan 11 '21 at 13:51
  • Simple for-loop with -O3 and -march=native on latest Ryzen cpu compiles into AVX2 (checked this with -s comiler flag), so it is pointless to take over the compiler's optimization job. – exoze Jan 11 '21 at 14:19
  • If [this](https://godbolt.org/z/j1qqaY) is what you mean, then yes there was autovectorization, but *bad* autovectorization: GCC doesn't use any good techniques here, it's just brute force naive vectorization – harold Jan 11 '21 at 15:09