The straightforward way to do your XOR-masking is by bytes:
void encrypt(uint8_t* in, size_t len, const uint8_t key[8])
{
for (size_t i = 0; i < len; i++) {
in[i] ^= key[i % 8];
}
}
Note: here the key
is an array of 8 bytes, not a 64-bit number. This code is straightforward - no tricks needed, easy to debug. Measure its performance, and be done with it if the performance is good enough.
Some (most?) compilers optimize such simple code by vectorizing it. That is, all the details (casting to uint64_t
and such) are performed by the compiler. However, if you try to be "clever" in your code, you may inadvertently prevent the compiler from doing the optimization. So try to write simple code.
P.S. You should probably also use the restrict
keyword, which is currently non-standard, but may be required for best performance. I have no experience with using it, so didn't add it to my example.
If you have a bad compiler, cannot enable the vectorization option, or just want to play around, you can use this version with casting:
void encrypt(uint8_t* in, size_t len, uint64_t key)
{
uint64_t* in64 = reinterpret_cast<uint64_t*>(in);
for (size_t i = 0; i < len / 8; i++) {
in64[i] ^= key;
}
}
It has some limitations:
- Requires the length to be divisible by 8
- Requires the processor to support unaligned pointers (not sure about x86 - will probably work)
- Compiler may refuse to vectorize this one, leading to worse performance
- As noted by Hurkyl, the order of the 8 bytes in the mask is not clear (on x86, little-endian, the least significant byte will mask the first byte of the input array)