I want to generate a large amount (10 TB) of seemingly random, but predictable numbers. The generation speed should exceed that of fast SSDs, so I want 3000 MB/s to 4000 MB/s.
After the file has been written, the numbers will be read again and generated again, so that they can be compared. The total program is supposed to check disks.
At the moment I'm thinking of hashes. The data to be hashed is just a 8 byte number (ulong
) for the predictability. So in the binary file it looks like this
<32 bytes of SHA256(0)> <32 bytes of SHA256(1)> ...
I don't think I can use a Random number generator with a seed, because I can't tell the random number generator to generate the nth number. But I can tell the SHA256 algorithm to calculate SHA256(n).
I made a test with 128 MB of data using the SHA256 algorithm like this:
Parallel.For(0, 128 * 1024 * 1024 / 32, // 128 MB / length of the hash
a => {
var sha = SHA256.Create();
sha.Initialize();
var ba = new byte[8];
ba[0] = (byte)((long)a >> 0 & 0xFF);
ba[1] = (byte)((long)a >> 8 & 0xFF);
ba[2] = (byte)((long)a >> 16 & 0xFF);
ba[3] = (byte)((long)a >> 24 & 0xFF);
ba[4] = (byte)((long)a >> 32 & 0xFF);
ba[5] = (byte)((long)a >> 40 & 0xFF);
ba[6] = (byte)((long)a >> 48 & 0xFF);
ba[7] = (byte)((long)a >> 56 & 0xFF);
var hash = sha.ComputeHash(ba);
// TODO: aggregate the byte[]s, stream to file
}
);
Like that, the throughput is only 95 MB/s on my Ryzen 7 2700X 8 core processor running at 4,08 GHz.
Any chance of speeding this up to 4000 MB/s?