I need to port a Xorshift algorithm from scalar to vector code (SSE/SIMD
version built with -march=nocona
).
I'm using the uint32_t version of the algorithm (taken directly from wiki):
#include <stdint.h>
struct xorshift32_state {
uint32_t a;
};
/* The state word must be initialized to non-zero */
uint32_t xorshift32(struct xorshift32_state *state)
{
/* Algorithm "xor" from p. 4 of Marsaglia, "Xorshift RNGs" */
uint32_t x = state->a;
x ^= x << 13;
x ^= x >> 17;
x ^= x << 5;
return state->a = x;
}
The main problems are:
- it uses uint32, so (by standard) it wrap around automatically
- due to SSE3 "limits", I would to stay with m128i (which is signed, and give to me all the operations I need, I believe)
- signed overflow is undefined behaviour in c++ standard
How would you manage this porting using SIMD? Dealing with epu32 and subtract half the uint32 max (and than add)?