Option1 should be avoided at all costs!!! The problem here is that the input to the method (plaintext) is a reference, and so the string exists outside of the scope of the method. This means the compiler cannot determine the scope of that variable, and therefore is unable to determine whether it is safe to perform optimisations (not always the case, but it is here).
https://godbolt.org/z/EBtVp7
Implementing a dumb method here (just adds 12 to each char). You'll notice the ASM for the first version looks 'nice'. Its very simple, and very small, awesome. However if you toggle the 1 to 0 and compare against the second method, you'll notice the second method has an explosion in terms of the amount of asm generated, however it isn't all that bad when you look closer.
Taking a look at the first code snippet, We can see this within the first line of the inner loop:
mov rcx, qword ptr [rdi]
This kinda sucks. It's actually reading the string 'begin' pointer on each iteration (the assumption being another thread *may* resize the string, and therefore change the string length).
If however you look at the second method, it has generated some unrolled loops using the vpaddb instruction (using YMM registers). This means it's processing 32chars at a time (unlike the first method which is only able to process 1 char at a time).
If you wanted to start making option1 approach the performance of option2, you'd need to do something grim like:
void Cipher(std::string &plaintext, int key) {
if(!plaintext.empty())
{
char* ptr = &plaintext[0];
for (int i = 0, length = plaintext.length(); i < length; i++) {
ptr[i] += 12;
}
}
}
This horrible change now means the compiler can see that the ptr and length variables do not change within the function scope, and so it is now able to vectorise the code. (Options 2 and 3 are still more efficient though!)
Option3 will not allocate a char on each iteration (it'll either load a char into a general purpose register, or a set of chars into a YMM register). The difference in performance in this case is moot. Use option2 if you want to modify the string, use option3 if the string is read only.
An older alternative that achieves the same thing is std::for_each, however that's no longer preferable to range based for loops.