6

I observe an interesting problem with the Microsoft implementation of strncat. It touches 1 byte beyond the source buffer. Consider the following code:

#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
#include <string.h>

void main()
{
    char dstBuf[1024];
    char* src = malloc(112);
    memset(src, 'a', 112);
    dstBuf[0] = 0;
    strncat(dstBuf, src, 112);
}

strncat reads 1 byte after 112 byte block. So if you are unlucky enough to get allocation on an invalid page boundary, your application crashes. Large applications can crash intermittently in such places. (Note that such condition can be simulated with gflags PageHeap setting; block size has to be divisible by pointer size for proper alignment.)

Is this the expected behavior or a bug? Any links confirming that? (I read several descriptions of strncat but they can be interpreted both ways depending on your initial set of mind...)

Update (to answer questions about evidence): I apologize if it is not clear from the text above, but this is an experimental fact. I observe intermittent crashes in an application at strncat reading address src+srcBufSize. In this small example run with gflags PageHeap on crash reproduces consistently (100%). So as far as I can see the evidence is very solid.

Update2 (info on compiler) MS Visual Studio 2005 Version 8.0.50727.867. Build platform: 64 bit release (no repro for 32 bit). OS used to repro the crash: Windows Server 2008 R2.

Update 3 The problem also reproduces with a binary built in MS Visual Studio 2012 11.0.50727.1

Update 4 Link to issue on Microsoft Connect; link to discussion on MSDN Forums

Update 5 The problem will be fixed in the next VS release. No fix is planned for old versions. See the "Microsoft Connect" link above.

glagolig
  • 1,100
  • 1
  • 12
  • 28
  • What's your evidence for this claim? – user207421 Aug 30 '13 at 05:15
  • I would love to look at your binary that behaves like this. Also, what version of compiler / library are you using? – Jonathon Reinhart Aug 30 '13 at 06:47
  • @Jonathon Reinhart: Binary http://sdrv.ms/17rN1cF – glagolig Aug 30 '13 at 17:24
  • I asked for evidence, not an executable. Nobody is going to execute an unknown .exe from an untrusted source, if they have any conception of computer security whatsoever. What's your *evidence?* – user207421 Sep 01 '13 at 10:10
  • @EJP I was replying to OP who explicitly asked for binary. I guess the OP can view disassembler without running or set up a safe sand box. I updated the question to elaborate on evidence. – glagolig Sep 01 '13 at 22:15
  • 1
    So, if `src` points to a non NUL-terminated array of pointers whose size is a multiple of 8 other than 0, the dword just beyond the array is read, altough the value read is unused. `mov (%rdx),%rax; sub $0x8,%r8; jb 0x400011a7` ← I can think of a couple of ways to change that, at the expense of performance. – ninjalj Sep 06 '13 at 16:07
  • @OP: so you cannot reproduce it on 32-bit. Can you share a 32-bit binary? – ninjalj Sep 06 '13 at 17:53
  • @ninjalj: Done. I put it in the same share. – glagolig Sep 07 '13 at 03:30
  • Changing `jb` to `jbe` works, but then the final qword copy is made byte by byte. Apropos the 32-bit version, while it's based on the same idea, the details differ quite a bit. In particular, the aligned-[qd]word-concatenation loop uses a dword counter and exits on zero on the 32-bit version, while on the 64-bit version a byte-counter is used and it exits on below zero (as I said, it could have been on below or equal to zero, the 64-bit version has fallback code for up to 8 final bytes). – ninjalj Sep 09 '13 at 11:05
  • 1
    Sinking `mov (%rdx),%rax` after the `jb 0x400011a7` also works, but some of the following instructions depend on `%rax`, which may cause a performance degradation. In summary, I agree it's a bug. – ninjalj Sep 09 '13 at 11:22
  • 2
    Didn't Microsoft already fix this in 2008 or 2009? See http://support.microsoft.com/kb/956420 and http://connect.microsoft.com/VisualStudio/Downloads/DownloadDetails.aspx?DownloadID=19141. So why is the bug still present in VS 2012? – Joseph Quinsey Feb 12 '14 at 04:09
  • 1
    @JosephQuinsey: Note that that fix was for strncpy, not strncat. This question is asking about strncat. – James McNellis Apr 03 '14 at 08:52

3 Answers3

3

The documentation for strncat states:

src - pointer to the null-terminated byte string to copy from

Therefore, the implementation can assume that the src input parameter is in fact NUL-terminated, even if it is longer than count characters.

For further confirmation, Microsoft's own documentation states:

strSource

Null-terminated source string.

On the other hand, the actual C standard states something like:

The strncat function appends not more than n characters (a null character and characters that follow it are not appended) from the array pointed to by s2 to the end of the string pointed to by s1.

As pointed out in the comments below, this identifies the second parameter s2 as an array and not a NUL-terminated string. However, this is still ambiguous with respect to the original question, because this documentation describes the ultimate effect on s1, rather than the behaviour of the function when reading from s2.

This could of course be settled with respect to the specific Microsoft implementation by consulting the C Runtime Library source code.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • 1
    Not sure if I'm buying this. It says "The **strncat** function appends, at most, the first *count* characters of *strSource* to *strDest*." – Jonathon Reinhart Aug 30 '13 at 03:52
  • 2
    @JonathonReinhart: That's the resulting *behaviour*, but the input `src` string is nevertheless expected to be null-terminated. – Greg Hewgill Aug 30 '13 at 04:03
  • @JonathonReinhart - If you supply a non-null terminated string to perform a concatenation on what do you expect? This answer is perfect – Ed Heal Aug 30 '13 at 04:44
  • 1
    The input string is expected to be null-terminated in case it is shorter than `count.` There is no language there that says it can read beyond `count` characters – user207421 Aug 30 '13 at 05:05
  • I saw Microsoft documentation. I do not think it is clear enough. As I wrote, the interpretation depends on your initial set of mind about the subject. Here is a different link that does not mention null-termination: http://www.cplusplus.com/reference/cstring/strncat/ – glagolig Aug 30 '13 at 06:22
  • 3
    @glagolig: Your link says "C string". C strings are always NUL-terminated. – Greg Hewgill Aug 30 '13 at 06:27
  • @Greg I disagree. There's no reason it should ever read more than `count` characters. [`strncat(3)`](http://linux.die.net/man/3/strncat) says `"The strncat() function is similar, except that * it will use at most n bytes from src; and * src does not need to be null-terminated if it contains n or more bytes."` – Jonathon Reinhart Aug 30 '13 at 06:46
  • In fact, I can't even come up with a valid implementation that would care if the string was null-terminated. (as far as not overrunning `count` - how/why would it?) – Jonathon Reinhart Aug 30 '13 at 06:49
  • @EdHeal `char src[3] = { 'a', 'a', 'a' }; char dst[100]; dst[0] = '\0'; strncat(dst, src, 3); printf("%s\n", dst);` I would expect this to function correctly, and print three `a` characters and not overrun any buffers. – Jonathon Reinhart Aug 30 '13 at 06:54
  • @JonathonReinhart: You're making reference to a specific (glibc) implementation of `strncat`, which is not the Microsoft implementation to which the OP refers. – Greg Hewgill Aug 30 '13 at 07:19
  • 1
    @GregHewgill Right to show that what you're saying doesn't make sense. I've also cited another piece of documentation that says the OP's behavior *shouldn't* happen. The OP asked if what he was seeing was a bug... if the code he's showing touches `src[112]` then I'd say yes, his libc has a bug. – Jonathon Reinhart Aug 30 '13 at 07:33
  • 4
    @GregHewgill: that conflicts with the C standard which treats the source operand as an array: _The strncat() function shall append not more than n bytes (a null byte and bytes that follow it are not appended) from the array pointed to by s2 to the end of the string pointed to by s1._ – ninjalj Aug 30 '13 at 08:37
  • @ninjalj: Thanks for that, I've updated my answer with that info. – Greg Hewgill Aug 30 '13 at 09:53
  • 1
    While the standard says nothing of the internal implementation of `strnlen`, the runtime library has no business accessing bytes past the end of the array specified by (start=`s2`, length=`n`). At least not unless it knows the array to be padded e.g: to dword size. – ninjalj Aug 30 '13 at 11:31
  • I accepted the answer since it has a reference to C standard. Now I think what I observe is a bug. Our code builds on several different platforms, developers cannot possibly review CRT sources on all platforms for every string function. – glagolig Sep 01 '13 at 15:14
2

s2 is not a "string" in strncat(s1, s2, n).

So if Microsoft is reading pass n bytes, it is not C11 compliant.

C11 7.24.2.3.1 strcat() mentions
"appends a copy of the string pointed to by s2 (including the terminating null character) to the end of the string pointed to by s1".

C11 7.24.2.3.2 strncat says
"The strncat function appends not more than n characters (a null character and characters that follow it are not appended) from the array pointed to by s2 to the end of the string pointed to by s1. ... A terminating null character is always appended to the result"

Clearly in the strncat case, s2 is viewed as an "array" with a string-like limitations on how much is appended to s1. Thus during the concatenation, this is no need to inspect s2 more than what is absolutely needed. The final written \0 comes from code, not s2.

Don't know about the older C99 standard.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
1

English is an imperfect language, more so than C.

The documentation says "at most n characters" (my emphasis). There is no evidence to indicate that strncat copies more than 112 characters. What makes you believe it does?

The code of strncat might index past an offset of 112, but not actually reference offset 113 which could cause a storage fault. This ptr behavior is defined as acceptable in K&R.

Finally, again this is an English/reasoning problem, the documentation probably does say null terminated string. But really, isn't it redundant to say a string is null terminated? They are by definition, otherwise they would be an array of characters. So, the documentation is being vague and non-specific. The programmer is left to read between the lines. Software documentation are not legal tomes, they are descriptions that are meant to be understood by someone practiced in the art.

JackCColeman
  • 3,777
  • 1
  • 15
  • 21
  • _Software documentation are not legal tomes_ ← but standards are indeed somewhat like legal tomes, and the C standard treats the source argument as an array. `strn*` functions are designed to work with _fixed size records_, so they don't behave exactly as if their subject matter were just strings. – ninjalj Aug 30 '13 at 08:42
  • @ninjalj, I agree with you that they are fixed size records and n determines the length of that record. C is allowed/defined to index 1 past the end of an array, which is why `for(etc; i++)` works! – JackCColeman Aug 30 '13 at 18:13