1

This is my string which i loaded into a __m256i

static __attribute__((aligned(32))) char data[33] = "  Mozilla/5.0 (Windows NT 10.0; ";

__m256i vec_data = _mm256_load_si256((const __m256i *) data);

there are 2 spaces (0x20) at data[0] and data[1] ... i want to use shuffle and delete these spaces ... so i did this :

vec_data = _mm256_shuffle_epi8(vec_data, _mm256_set_epi8(-1,-1,31,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2));

what i expected is:

Mozilla/5.0 (Windows NT 10.0;

but what is stored into the vec_data is this:

Mozilla/5.0 (W  dows NT 10.0; 

whitespaces are delete from the begin of string but what happend to 2 characters in the middle !!!???? Windows changed to W dows !!!!!!

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
HelloMachine
  • 355
  • 2
  • 8
  • 2
    Recall that `pshufb` shuffles within lanes. Your shuffle mask however tries to shuffle across lanes which won't yield the desired result. Unfortunately, there is no nice solution to this. You have to manually carry the two characters across the lane boundary somehow. – fuz Sep 13 '20 at 12:20
  • As an example, see [compose_avx](https://github.com/fuzxxl/24puzzle/blob/master/transposition.c#L111) for how an arbitrary across-lane shuffle could be synthesised using the AVX2 instruction set. For your case, there's likely a simpler solution. – fuz Sep 13 '20 at 12:30
  • 2
    @fuz: No nice solution... until AVX512VBMI for `vpermb`. ([Where is VPERMB in AVX2?](https://stackoverflow.com/q/37980209)) Requires Ice Lake. But yeah, for this you might be able to do separate overlapping stores of the low and high halves, with a shuffle control chosen appropriately. Or in this specific case just an unaligned 32-byte load that starts 2 bytes later. – Peter Cordes Sep 13 '20 at 12:31
  • 2
    Almost a duplicate of [Emulating shifts on 32 bytes with AVX](https://stackoverflow.com/q/25248766). (But unaligned load is a better choice here.) – Peter Cordes Sep 13 '20 at 12:36

0 Answers0