9

I want to understand if the following code is (always, sometimes or never) well-defined according to C11:

#include <string.h>
int main() {
  char d[5];
  char s[4] = "abc";
  char *p = s;
  strncpy(d, p, 4);
  p += 4; // one-past end of "abc"
  strncpy(d+4, p, 0); // is this undefined behavior?
  return 0;
}

C11 7.24.2.4.2 says:

The strncpy function copies not more than n characters (characters that follow a null character are not copied) from the array pointed to by s2 to the array pointed to by s1.

Note that s2 is an array, not a string (so the lack of null-terminator when p == s+4 is not an issue).

7.24.1 (String function conventions) applies here (emphasis mine):

Where an argument declared as size_t n specifies the length of the array for a function, n can have the value zero on a call to that function. Unless explicitly stated otherwise in the description of a particular function in this subclause, pointer arguments on such a call shall still have valid values, as described in 7.1.4. On such a call, a function that locates a character finds no occurrence, a function that compares two character sequences returns zero, and a function that copies characters copies zero characters.

The relevant part of the aforementioned 7.1.4 is (emphasis mine):

7.1.4 Use of library functions

Each of the following statements applies unless explicitly stated otherwise in the detailed descriptions that follow: If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after promotion) not expected by a function with variable number of arguments, the behavior is undefined. If a function argument is described as being an array, the pointer actually passed to the function shall have a value such that all address computations and accesses to objects (that would be valid if the pointer did point to the first element of such an array) are in fact valid.

I'm having some trouble parsing the last part. The "all addresses computations and accesses to objects" seems to be trivially satisfied when n == 0 if I can suppose my implementation will not compute any addresses in this case.

In other words, in a strict interpretation of the standard, should I always refuse the program? Should I always allow it? Or is its correctness implementation-dependent (i.e., if the implementation computes the address of the first character before checking n, then the above code has UB, otherwise it doesn't)?

anol
  • 8,264
  • 3
  • 34
  • 78
  • 1
    IANALL, but I don't think talk of _computing_ the address before checking `n` is relevant -- you've already _computed_ the address in setting `p`. It's a question of (a) whether (that computed) value of `p` is valid, and (b) whether it's _used_ (when `n` is zero). From what I can remember (but I'm not an expert) the address of the first element _past_ an array (i.e. `s+4`) _is_ a valid address _provided_ you don't try and access what's there. – TripeHound Jul 27 '17 at 12:23
  • 1
    For reference, the spec's function signature: `char *strncpy(char * restrict s1, const char * restrict s2, size_t n);` I think this should be part of the post. – chux - Reinstate Monica Jul 27 '17 at 12:57
  • 2
    Detail "Note that s2 is an array ..." should be "Note that s2 points to an array". `s2` is a pointer, not an array. – chux - Reinstate Monica Jul 27 '17 at 13:12
  • @chux The bolded text "If a function argument is described as being an array" applies to "Note that s2 points to an array" – M.M Jul 27 '17 at 13:52
  • @anol An analogous edge case would be `const char foo = "abc"; char *bar = "xyz"; strncpy(fool, bar, 0);` If the count was 1, it is UB to write to `const char *`, yet with 0, and no writes, is it UB? I expect this to be UB as `strncpy(any_const_char_star, blah, blah)` is certainly UB. All-in-all interesting edge case post you have. – chux - Reinstate Monica Jul 27 '17 at 15:13

4 Answers4

3

char *strncpy(char * restrict s1, const char * restrict s2, size_t n);

The strncpy function copies not more than n characters (...) from the array pointed to by s2" C11 §7.24.4.5 3

The details of strncpy() do not suffceintly answer the "strncpy(d, s, 0) with one-past pointer". Certainly access to *s2 is not expected, yet does access to *s2 need to be valid with n==0?

Neither does 7.24.1 (String function conventions).

7.1.4 Use of library functions does answer, depending on if the () part applies in part or in whole to the previous "this and that"

... If a function argument is described as being an array, the pointer actually passed to the function shall have a value such that all address computations and accesses to objects (that would be valid if the pointer did point to the first element of such an array) are in fact valid....

  1. If the "(that would be valid if the pointer did point to the first element of such an array)" applies to only "accesses to objects", then strncpy(d, s, 0) is fine as the pointer value needs not have array characteristics. It simply needs to be a valid computable value.

  2. If the "(that would be valid if the pointer did point to the first element of such an array)" applies also to "address computations", then strncpy(d, s, 0) is UB as the pointer value needs have array characteristics. which includes the valid address computation of one-passed s. Yet a valid computation address one-passed is not certain when s itself is a one-passed value.

As I read the spec, the first applies, thus defined behavior for 2 reasons. 1) the parenthetical part, from an English point-of-view, applies to the 2nd part and 2) access is not needed to perform the function.

The 2nd is a possible reading, but a stretch.

Community
  • 1
  • 1
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • "*accesses to objects (that would be valid if the pointer did point to the first element of such an array)*" <- and accessing the first element of an array is always valid. "*the pointer value needs not have array characteristics*" <- why? `strncpy()`'s argument is "*described as being an array*" ... –  Jul 27 '17 at 13:43
2

The part you highlighted:

the pointer actually passed to the function shall have a value such that all address computations and accesses to objects [...] are in fact valid.

makes it clear that your code is indeed invalid. In the part talking about a zero size_t argument:

On such a call, a function that locates a character finds no occurrence, a function that compares two character sequences returns zero, and a function that copies characters copies zero characters.

There's no guarantee that a copying function doesn't try to access anything.

So, looking at this "from the other side", the following strncpy() implementation would be conforming:

char *strncpy(char *s1, const char *s2, size_t n)
{
    size_t i = 0;
    char c = *s2;

    while (i < n)
    {
        if (c) c = s2[i];
        s1[i++] = c;
    }
    return s1;
}

Of course, this is silly code, a sane implementation would e.g. just initialize char c = 1, so I would be surprised if you find a C implementation in the wild that would exhibit unexpected behavior for your code.


There's one more arguments supporting that a conforming implementation is allowed to access *s2 in any case: Zero-sized arrays aren't allowed in C. So if s2 should be a pointer to an array, *s2 must be valid. This is closely related to the wording of your cited §7.1.4

  • @chux: Uhhhm right ... let me try to construct a really *conforming* version then ... –  Jul 27 '17 at 13:07
  • I don't see anything what you quoted that says `strncpy` is allowed to access more than `n` characters of `s2`. You say it's clear that the code is invalid, but to me it seems clear that it's valid. – interjay Jul 27 '17 at 13:10
  • As `s2` is supposed to point to an array and zero-size arrays aren't allowed anyways, it can access *at least* the first element. The citations require accesses to be valid. –  Jul 27 '17 at 13:11
  • and @interjay there's nothing in the standard telling that for a size parameter of 0, no accesses happen. It only tells that 0 characters are copied, which is a different thing. –  Jul 27 '17 at 13:14
  • @interjay [Posted as Felix posted his] I believe the "hair-splitting" is over the fact that the standard said no characters are _copied_ if `n==0`, not _necessarily_ that no characters are _read_... As Felix has demonstrated, it's _possible_ (though silly) to write a function that reads a byte and then decides whether or not to copy it. (@Felix: a better example might be to move the `c = ...` line in the loop down one line, so you're at least _using_ the value of `c` you pre-loaded outside the loop). – TripeHound Jul 27 '17 at 13:15
  • The point about zero-sized arrays may be valid, I'd add that to the answer. Regarding your second comment, there's also nothing telling that it won't access `s2[n+100]` but that's obviously not going to happen. – interjay Jul 27 '17 at 13:16
  • @TripeHound it is used now after changing the implementation to really be conforming, as chux pointed out ... and I think the example even makes more sense now. –  Jul 27 '17 at 13:16
  • @TripeHound If the standard only limits writes, what prevents an `strncpy` implementation from reading `s2[n+100]`? It obviously isn't allowed to do that, just like it obviously isn't allowed to read `s2[n]`. But Felix may be right about this being invalid because zero-size arrays are not allowed. – interjay Jul 27 '17 at 13:18
  • @interjay Unfortunately, with modern, optimising compilers, "_obviously not going to happen_" is a very dangerous thing to assume, especially if UB comes into play. Although this example might seem overly pedantic, I've seen posts on SO with examples where they almost seem to have been deliberately perverse in what they do. – TripeHound Jul 27 '17 at 13:21
  • @interjay there's even more reasoning why `strncpy()` is **always** allowed to access `*s2`: `s2` is supposed to point to a string. `n` doesn't denote the length of `s2` (the final `0` terminator does), but the size of `s1`. I'll update my answer with this reasoning to improve it. –  Jul 27 '17 at 13:23
  • @TripeHound Then are you saying that an implementation is allowed to read `s2[n+100]`? – interjay Jul 27 '17 at 13:23
  • @FelixPalmen That's wrong. `s2` is *not* guaranteed to be a string and doesn't have to be null-terminated. In fact, the standard says: "**If** the array pointed to by s2 is a string that is shorter than n characters...", showing that s2 doesn't have to be a string (in other places it's only referred to as an array). – interjay Jul 27 '17 at 13:24
  • @interjay it has to be a string **or** it must be at least `n` characters in size, while arrays in general must have a size of at least `1`. This is getting complicated. –  Jul 27 '17 at 13:28
  • @FelixPalmen Yes, you are getting back to the point about zero-sized arrays being invalid which I conceded could be right. `s2` does not have to be a string however. An implementation which called `strlen(s2)` unconditionally would be wrong. – interjay Jul 27 '17 at 13:31
  • @interjay No, it would not be allowed to access `s2[n+100]`. It _probably_ won't access `s2[0]` when `n==0` but it's not 100% conclusive it mustn't (the standard says it must be a valid address, and that no characters will be _copied_. If an implementation _absolutely must not read_ an address (when `n==0`) they _could_ have made the requirement for valid addresses conditional on `n!=0`. That they didn't is _probably_ an oversight, but _could_ be taken to mean it is allowed to be read. – TripeHound Jul 27 '17 at 13:34
  • I removed the reference to the string length, as it doesn't help much (it's only relevant for strings shorter than `n`, which isn't the topic here). @TripeHound I also doubt that this whole thing was intentional. Probably just an edge-case overlooked. –  Jul 27 '17 at 13:36
  • @TripeHound Then by that interpretation, what's the wording stopping an implementation from reading `s2[n]` for n>0? I contend that "copying" refers to both reading and writing, which means the function isn't allowed to either read or write more than n characters. – interjay Jul 27 '17 at 13:36
  • @interjay I don't think so. *copying* is copying, and read accesses are allowed on the array by the constraints given in the function description. The description of `strncpy()` in the standard implies that `s2` points to an array that *either* holds a string *or* has at least `n` elements. –  Jul 27 '17 at 13:40
  • @interjay Possibly "copying" does mean that, and if that's mentioned in the spec. then we're probably home and dry. I happen to believe the original code _is_ safe; the debate is whether the insistence of the second address being valid _even when `n==0`_ implies that an implementation is permitted to read, so long as it doesn't "complete" the copy. – TripeHound Jul 27 '17 at 13:44
  • "There's no guarantee that a copying function doesn't try to access anything." - it's implied that functions don't access any memory other than what their specification says. By your logic, you could say that `strcpy` causes undefined behaviour for all inputs because it doesn't say anywhere that it doesn't read some characters from before the pointer... – M.M Jul 27 '17 at 13:54
  • @M.M no, it's specified what are the requirements for a parameter of a library function that's described as an array. This is a stronger requirement than just a "valid pointer". –  Jul 27 '17 at 14:15
2

The address computed by p + 4 is not an invalid value. It is explicitly permitted to point one-past-the-end of an array (C11 6.5.6/8), and common usage to use such pointers as function arguments. So, the code is correct.

You suspected a problem according to the following text:

If a function argument is described as being an array, the pointer actually passed to the function shall have a value such that all address computations and accesses to objects (that would be valid if the pointer did point to the first element of such an array) are in fact valid.

For a call to strncpy with length argument 0, it is specified that no characters are copied, therefore there are no accesses to objects. It might involve adding 0 to the pointer, but it is well-defined to add 0 to a past-the-end pointer.

Some commentors are hung up on "the first element of such an array". You can't declare a zero-sized array in C, although you can create one (e.g. malloc(0) is allowed to return a non-null pointer that is not an invalid pointer). I think it is sensible to treat the above quoted text as intending to be inclusive of the past-the-end pointer.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • whatever `malloc(0)` returns, it is not a pointer to an array. It may return a pointer "*as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.*" (so there **is** a non-zero size **and** accessing whatever it is isn't allowed) –  Jul 27 '17 at 14:36
  • "*I think it is sensible to treat the above quoted text as intending to be inclusive of the past-the-end pointer.*" it is a sensible assumption this was *intended*, but it's not what is in the words... –  Jul 27 '17 at 14:37
  • @FelixPalmen The intent is what matters, not the exact wording. Defect Reports can be filed if the wording doesn't reflect the intent, this has happened many times – M.M Jul 27 '17 at 21:47
  • For a language-lawyer question, I don't think the assumed intent should matter. Of course the wording of the standard should be improved if it doesn't reflect the intent and a DR is the way to trigger this. But with the current wording, the code shown would be invalid. –  Jul 28 '17 at 06:05
1

Surprisingly, the standard never defined what an array is. It defines what an array object is, but clearly the definition of strncpy cannot possibly mean array objects. Firstly because the types are wrong (a pointer to an array object cannot have type char*). Secondly, because with this interpretation one would not be able to manipulate strings to any useful extent. Indeed strncpy (p, s+1, n) would become always invalid because s+1 never points to an actual array object.

Therefore, if we want to produce a C implementation that is at least marginally useful, we must adopt another interpretation of "array pointed to by" (not just in the definition of strncpy but everywhere in the standard where such phrase appears). Namely, these words have no choice but denote a portion of an array object that starts from an element actually pointed to by the pointer. When the pointer points past the end of the array, the portion in question has zero size.

Once this key fact is established, the rest is easy. There is no prohibition of zero-sized portions of array objects (no reason to single them out). When a standard function is commanded to traverse such a portion, nothing should happen because it contains no elements.

Whether or not we are allowed to adopt this interpretation is outside of the scope of this answer.

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
  • 1
    Re. "The standard never defined what an array is", [I asked this here](https://stackoverflow.com/questions/30546449/what-is-the-definition-of-array-in-c) but nobody seemed to know – M.M Aug 24 '17 at 21:25