1

glibc now uses SSE 4.2 to optimize strncmp:

This can be seen in a debugger:

   0xf7f20218 <__strncmp_sse4_2+40>    movdqu xmm2, xmmword ptr [edx]
   0xf7f2021c <__strncmp_sse4_2+44>    mov    ecx, eax
 ► 0xf7f2021e <__strncmp_sse4_2+46>    and    ecx, 0xfff
   0xf7f20224 <__strncmp_sse4_2+52>    cmp    ecx, 0xff0
   0xf7f2022a <__strncmp_sse4_2+58>    ja     __strncmp_sse4_2+125                    <__strncmp_sse4_2+125>

I'm not steeped in SSE 4.2 for strings, but my understanding is that it allows it to compare up to 16 bytes at a time. The movdqu xmm2, xmmword ptr [edx] loads 16 bytes from one of the strings.

My question is: If a short string (say 3 bytes) is at the end of a page, with the NULL termination within the page limits, but some of the remaining 13 bytes outside of the page, couldn't that cause a segfault, since we're now trying to load beyond the page we have access to?

This question came up in working on an emulator, which trapped an unconstrained access (that is, a read of memory which my application never wrote to):

strncmp(0x8064dd8, 0x7ffeff48, 0x4)
WARNING Filling memory at 0x7ffeff60 with 4 unconstrained bytes referenced from 0x818ba90 (strncmp+0x0 in libc.so.6 (0x8ba90))

This is perplexing, because:

  • Why is strncmp of max 4 bytes starting at 0x7ffeff48 reading values from 0x7ffeff60

That is, it doesn't seem to be a bug in the caller, but rather unexpected behavior by strncmp. Debugging strncmp led me to the SSE 4.2, which explains partially why it's reading beyond the limit set by n: it simply uses SSE 4.2 to load many bytes at once, even if it doesn't need them at all.

Questions:

  • Is this correct? Does strncmp_sse4_2 read more than n bytes?
  • Even if it does: Doing 16 bytes at a time should stop at 0x7ffeff58. Why does it read till 0x7ffeff60?
  • If so, how does this not potentially cause a page fault?
  • If so, how do we tell distinguish acceptable read of uninitialized data from cases indicating bugs? E.g. how would Valgrind avoid reporting this as an uninitialized read?
SRobertJames
  • 8,210
  • 14
  • 60
  • 107
  • 2
    One can avoid page fault by checking that the starting address of each 16-byte read is a multiple of 16. And overall ensuring correctness by using other mechanism for the excess/left over. – Aki Suihkonen Feb 09 '22 at 05:51
  • 1
    In this case it checks that it's not within 16 bytes of the end of a page for the first vector. (Presumably it checked something about `edx` *before* using it, in code you didn't show. Like it's doing for ECX after doing EDX) See also [Is it safe to read past the end of a buffer within the same page on x86 and x64?](https://stackoverflow.com/q/37800739). Yes, `strncmp_sse4_2` can read more than `n` bytes if `n` is less than 16, or depending on `n%16` and/or pointer alignment; `pcmpestri` explicit-length string compare instructions are designed for that if you can safely load the data. – Peter Cordes Feb 09 '22 at 06:02

1 Answers1

2

Is this correct? Does strncmp_sse4_2 read more than n bytes?

Yes.

Even if it does: Doing 16 bytes at a time should stop at 0x7ffeff58. Why does it read till 0x7ffeff60?

You are assuming that it started using movdqu from the address you passed in. It likely didn't. It probably aligned the pointers to cache line first.

If so, how does this not potentially cause a page fault?

If you have a 16-byte aligned pointer p, that means p+15 points to the same page as p so you can read 16 bytes from p with impunity.

If so, how do we tell distinguish acceptable read of uninitialized data from cases indicating bugs? E.g. how would Valgrind avoid reporting this as an uninitialized read?

Valgrind does this by interposing its own copy of strcmp (for dynamically linked binaries). Without such interposition, valgrind produces false positives (or, rather valgrind produces true positives which nobody cares or could do anything about).

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • In your middle section, `p+16` pointing to the same page as `p` isn't necessary for aligned pointers. A 16-byte load gets `p[0..15]`. A naturally-aligned load of any width smaller than a page won't fault if it contains any byte the C abstract machine is allowed to read. So yes, as you said earlier, `p &= -16;` / `_mm_load_si128(p)` would just work, *or* like they do here, you can simply check that your original address is not within 15 bytes of the end of a page. i.e. that `(p & 0xfff) <= 0xff0` so the first string byte is still at the start of the vector. – Peter Cordes Feb 10 '22 at 04:21
  • We have a Q&A about that with more details: [Is it safe to read past the end of a buffer within the same page on x86 and x64?](https://stackoverflow.com/q/37800739) – Peter Cordes Feb 10 '22 at 04:28