4

There is a phrase that keeps popping up in Annex K of the C standard (bounds-checking interfaces):

....copying shall not take place between objects that overlap.

Considering, for example, strcpy_s( char * restrict s1, rsize_t s1max, char const * restrict s2 ), in which s1max specifies the maximum capacity of s1 to enable the bounds checking.

What exactly would be "the object" s1 at this point, which must not overlap with "the object" s2?

Would that be...

  • s1[0]..s1[s1max] (to the end of the buffer, i.e. the memory object),

or

  • s1[0]..s1[strnlen( s1, s1max )] (to the end of the string, i.e. the string object)?

If it is the former, I wonder about the lack of consistency, as I do not know the size of the buffer that is s2, and would have to apply a different definition of "the object".

If it is the latter, I wonder if it doesn't break "the promise" that is given, as conceivably the source string and the eventual (post-copy) destination string could overlap if the source string is longer than the original one.

What is the intention / the intended definition of "object" here?

DevSolar
  • 67,862
  • 21
  • 134
  • 209

2 Answers2

3

I believe the intent is such that the s1max characters starting from s1 must not overlap any of the characters in s2 including the null terminator. K.3.7.1.3p5 says that:

  1. All elements following the terminating null character (if any) written by strcpy_s in the array of s1max characters pointed to by s1 take unspecified values when strcpy_s returns. [418]

with the footnote 418 saying that

  1. This allows an implementation to copy characters from s2 to s1 while simultaneously checking if any of those characters are null. Such an approach might write a character to every element of s1 before discovering that the first element should be set to the null character.

However, Microsoft's says that "if source and dest overlap, the behavior is undefined", so this would hint that in fact anything could happen in that case. This seems to negate the usefulness of the bounds-checking interface.

  • Ah. Now it clicked -- if it weren't for this, the source string could be clobbered by the copy operation, which `restrict` already "promises" not to do... got it. – DevSolar Jun 15 '19 at 09:34
  • @DevSolar: One could write a function with `restrict`-qualified arguments which would detect whether the addresses overlap before using any argument to access any storage, and upon discovering that e.g. `arg1 == arg2+delta;`, performed all accesses using `arg2` or `arg2+delta`, without using `arg1` to access anything. – supercat Jun 15 '19 at 19:41
0

This is found everywhere in the standard, not just in the optional bounds-checking interface, but also in mandatory library functions such as strcpy. The bounds-checking interface functions merely inherited the very same text.

The formal definition of an object is:

3.15
object
region of data storage in the execution environment, the contents of which can represent values

Based on this, a string has to be the whole array including the null terminator. Because a function such as strcpy would break if the null terminator was somehow overwritten during copy - it has to be regarded as part of the (array) object.

There seem to be no definition of the term "overlap", but the intention is fairly clear: to prevent situations such as this:

  char str[] = "foobar";
  strcpy(str+3,str);

where one possible implementation of strcpy would be while(*dst++ = *src++){}. Which would break as it never hits the null terminator and we'd end up writing out of bounds.

Notably, you already promise the compiler that the parameters don't overlap when you pass them to a function expecting restrict pointers. The text in the standard regarding overlaps being undefined just makes it clearer still.

In the strcpy example, any lvalue access to what dst points at, is not allowed to modify what str points at, or we violate the definition of restrict (C17 6.7.3) and thereby invoke undefined behavior.

This is, as far as I know, always the programmer's responsibility. No compiler I know of gives diagnostic messages for restrict violations on the caller-side.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • The issue in `strcpy()` is a very different one. Ensuring that the objects don't overlap is the duty of the caller, not the function (as the result is "just" undefined, not a well-defined reaction as defined for the Annex K functions). Also, there is no two ways to read the definition of "object" in this case, as there isn't a "size" parameter given. – DevSolar Jun 17 '19 at 08:34
  • @DevSolar: The Standard is often a bit careless at making clear what aspects of behavior are or are not guaranteed. For example, on most platforms it would be significantly cheaper to guarantee that if the pointers given to `strcpy_s` overlap, it will either copy an Unspecified string that doesn't exceed the maximum length or else trap in defined fashion, than it would be to guarantee a trap, but for many purposes that loose guarantee would be just as useful as a guaranteed trap. – supercat Jun 17 '19 at 18:58