I have a simple test program that loads an xmm register with the movdqu instruction accessing data across a page boundary (OS = Linux).
If the following page is mapped, this works just fine. If it's not mapped then I get a SIGSEGV, which is probably expected.
However this diminishes the usefulness of the unaligned loads quite a bit. Additionally SSE4.2 instructions (like pcmpistri) which allow for unaligned memory references appear to exhibit this behavior as well.
That's all fine -- except there's many an implementation of strcmp using pcmpistri that I've found that don't seem to address this issue at all -- and I've been able to contrive trivial testcases that will cause these implementations to fail, while the byte-at-a-time trivial strcmp implementation will work just fine with the same data layout.
One more note -- it appears the the GNU C library implementation for 64-bit Linux has a __strcmp_sse42 variant that appears to use the pcmpistri instruction in a more safe manner. The implementation of this strcmp is fairly complex, but it appears to be carefully trying to avoid the page boundary issue. I'm not sure if that's due to the issue I describe above, or whether it's just a side-effect of trying to get better performance by aligning the data.
Anyway the question I have is primarily -- where can I find out more about this issue? I've typed in "movdqu crossing page boundary" and every variant of that I can think of to Google, but haven't come across anything particularly useful. If anyone can point me to further info on this it would be greatly appreciated.