My vectorized xorshift+ is not very random

Question

I have the following code (the xorshift128+ code from Wikipedia modified to use vector types):

#include <immintrin.h>
#include <climits>

__v8si rand_si() {
    static auto s0 = __v4du{4, 8, 15, 16},
        s1 = __v4du{23, 34, 42, 69};
    auto x = s0, y = s1;
    s0 = y;
    x ^= x << 23;
    s1 = x ^ y ^ (x >> 17) ^ (y >> 26);
    return (__v8si)(s1 + y);
}

#include <iostream>
#include <iomanip>
void foo() {
    //Shuffle a bit. The result is much worse without this.
    rand_si(); rand_si(); rand_si(); rand_si();
    auto val = rand_si();

    for (auto it = reinterpret_cast<int*>(&val);
         it != reinterpret_cast<int*>(&val + 1);
         ++it)
        std::cout << std::hex << std::setfill('0') << std::setw(8) << *it << ' ';
    std::cout << '\n';
}

which outputs

09e2a657 000b8020 1504cc3b 00110040 1360ff2b 00150078 2a9998b7 00228080

Every other number is very small, and none have the leading bit set. On the other hand, using xorshift* instead:

__v8si rand_si() {
    static auto x = __v4du{4, 8, 15, 16};
    x ^= x >> 12;
    x ^= x << 25;
    x ^= x >> 27;
    return x * (__v4du)_mm256_set1_epi64x(0x2545F4914F6CDD1D);
}

I get the much better output

0889632e a938b990 1e8b2f79 832e26bd 11280868 2a22d676 275ca4b8 10954ef9

But according to Wikipedia, xorshift+ is a good PRNG, and produces better pseudo-randomness than xorshift*. So, do I have a bug in my RNG code, or am I using it wrong?

You're building with `clang`, right? [gcc says](https://godbolt.org/g/qLUFuz) *:11:17: note: -flax-vector-conversions to permit conversions between vectors with differing element types or numbers of subparts* for `return s1 + y;` — Peter Cordes, Jul 03 '17 at 05:26
Why are you using `__v8si` and `__v4du` instead of the standard Intel-defined types? — Cody Gray - on strike, Jul 03 '17 at 09:59
BTW, the shift-counts in the Wikipedia version you used aren't the same as the shift-counts in the current version of the algorithm on its creator's web site ([archive.org link since the site is down atm](http://web.archive.org/web/20170208011728/http://xoroshiro.di.unimi.it/xorshift128plus.c)). The current version is thought to be slightly better. See also [an AVX2 implementation using Intel intrinsics](https://stackoverflow.com/a/35817827/224132) (much harder to read than yours using GNU native vectors, though.) — Peter Cordes, Jul 03 '17 at 13:58
BTW, I had good results from the same vectorized xorshift128+ for generating 1GB of ASCII random decimal digits, space-separated. ([30x faster with AVX2 on Haswell than a scalar version that avoided biases more carefully](https://unix.stackexchange.com/questions/323845/whats-the-fastest-way-to-generate-a-1-gb-text-file-containing-random-digits/324520#324520)). — Peter Cordes, Jul 03 '17 at 14:01

geza · Accepted Answer · 2017-07-02T14:34:15.190

I think you should not judge a random generator by looking at 8 numbers it generated. Furthermore, generators usually needs good seeding (your seeding can be considered bad - your seeds start with almost all bits zeros. Calling rand_si() just a few times is not enough for the bits to "spread").

So I recommend you to use proper seeding (for example, a simple solution is to call rand_si() even more times).

xorshift* look like behaving better because of the final multiplication, so it doesn't have easily spotted bad behavior because of inadequate seeding.

A tip: Compare the numbers your code generates with the original implementation. This way you can be sure that your implementation is correct.

score 2 · Answer 2 · answered Jul 03 '17 at 08:51

2

geza's answer was exactly right, the seeding was the culprit. It worked much better to seed it using a standard 64-bit PRNG:

void seed(uint64_t s) {
    std::mt19937_64 e(s);
    s0 = __v4du{e(), e(), e(), e()};
    s1 = __v4du{e(), e(), e(), e()};
}

answered Jul 03 '17 at 08:51

Dan

12,409
3
50
87

Just some nitpicking here: the order of `e()` calls is unspecified in s0 and s1 (the compiler is free to decide to order of `e()` calls for the constructor). So, if you need reproducible result among different compilers/compiler versions, you need to put the results of `e()` into variables first. – geza Jul 03 '17 at 09:05
1

@geza Isn't the order [defined in a braced-init-list](https://stackoverflow.com/q/14060264/603688) though? – Dan Jul 03 '17 at 09:20
oops, you're absolutely right! I'm sorry for the bad information. The standard makes an exception for this case. Interesting... – geza Jul 03 '17 at 09:27

score 1 · Answer 3 · answered Mar 29 '19 at 06:58

1

Both guys above are mistaken. The xorshift+ generator works fine even when the initial base (seed) is 1, 2, 3, ... and other simplest values. Generator fails on a zero valued seed only. Check your 64-bit variables representation and correctly work of the binary operators.

answered Mar 29 '19 at 06:58

Zuborg

11
1

Yes, but starting with "bad" seeds means you need more iterations to mix the bits around before the PRNG results actually get decently random. The OP *was* only looking at the first few results, and geza's answer did point this out. – Peter Cordes Mar 29 '19 at 07:32

My vectorized xorshift+ is not very random

3 Answers3