glibc now uses SSE 4.2 to optimize strncmp
:
- https://github.com/lattera/glibc/blob/master/sysdeps/x86_64/multiarch/strcmp-sse42.S
- https://www.strchr.com/strcmp_and_strlen_using_sse_4.2
This can be seen in a debugger:
0xf7f20218 <__strncmp_sse4_2+40> movdqu xmm2, xmmword ptr [edx]
0xf7f2021c <__strncmp_sse4_2+44> mov ecx, eax
► 0xf7f2021e <__strncmp_sse4_2+46> and ecx, 0xfff
0xf7f20224 <__strncmp_sse4_2+52> cmp ecx, 0xff0
0xf7f2022a <__strncmp_sse4_2+58> ja __strncmp_sse4_2+125 <__strncmp_sse4_2+125>
I'm not steeped in SSE 4.2 for strings, but my understanding is that it allows it to compare up to 16 bytes at a time. The movdqu xmm2, xmmword ptr [edx]
loads 16 bytes from one of the strings.
My question is: If a short string (say 3 bytes) is at the end of a page, with the NULL termination within the page limits, but some of the remaining 13 bytes outside of the page, couldn't that cause a segfault, since we're now trying to load beyond the page we have access to?
This question came up in working on an emulator, which trapped an unconstrained access (that is, a read of memory which my application never wrote to):
strncmp(0x8064dd8, 0x7ffeff48, 0x4)
WARNING Filling memory at 0x7ffeff60 with 4 unconstrained bytes referenced from 0x818ba90 (strncmp+0x0 in libc.so.6 (0x8ba90))
This is perplexing, because:
- Why is strncmp of max 4 bytes starting at 0x7ffeff48 reading values from 0x7ffeff60
That is, it doesn't seem to be a bug in the caller, but rather unexpected behavior by strncmp. Debugging strncmp led me to the SSE 4.2, which explains partially why it's reading beyond the limit set by n
: it simply uses SSE 4.2 to load many bytes at once, even if it doesn't need them at all.
Questions:
- Is this correct? Does
strncmp_sse4_2
read more thann
bytes? - Even if it does: Doing 16 bytes at a time should stop at 0x7ffeff58. Why does it read till 0x7ffeff60?
- If so, how does this not potentially cause a page fault?
- If so, how do we tell distinguish acceptable read of uninitialized data from cases indicating bugs? E.g. how would Valgrind avoid reporting this as an uninitialized read?