7

strlen returns the number of characters that precede the terminating null character. An implementation of strlen might look like this:

size_t strlen(const char * str)
{
    const char *s;
    for (s = str; *s; ++s) {}
    return(s - str);
}

This particular implementation dereferences s, where s may contain indeterminate values. It's equivalent to this:

int a;
int* p = &a;
*p;

So for example if one were to do this (which causes strlen to give an incorrect output):

char buffer[10];
buffer[9] = '\0';
strlen(buffer); 

Is it undefined behavior?

Cœur
  • 37,241
  • 25
  • 195
  • 267
ペニス
  • 73
  • 4
  • 1
    @user2864740 are you sure that the string *must* contain some value? Isn't C allowed to happily crash on a read-before-write? – Kijewski Sep 12 '14 at 01:27

4 Answers4

3

Calling the standard function strlen causes undefined behaviour. DR 451 clarifies this:

library functions will exhibit undefined behavior when used on indeterminate values

For a more in-depth discussion see this thread.

Community
  • 1
  • 1
M.M
  • 138,810
  • 21
  • 208
  • 365
  • My comment below refers to the poster's implementation of a strlen function. Agree that the standard library has other constraints or liberties. – KC-NH Sep 12 '14 at 01:34
  • @KC-NH updated my post to clarify that I'm talking about the standard `strlen` function, not OP's pseudo-implementation. – M.M Sep 12 '14 at 01:54
  • A DR and the committees reply to it is not normative, and for the particular case here, you are citing things out of context. The phrase that you are citing is the answer to a question which is if passing undetermined values to a library "can" have undefined behavior. The DR you are citing actually shows that the question is relatively complex and does lead to easy answers as this one. – Jens Gustedt Sep 12 '14 at 06:57
  • @JensGustedt well, we could say "The standard is unclear but DR 451 provides the committee's opinion on the matter". I don't think the quote is out of context; but anyone with doubts can and should read DR 451 in full – M.M Sep 12 '14 at 07:00
  • @MattMcNabb, that's not my point. I think that pointing to the DR is largely irrelevant here, since it discusses the stability of unspecified values. The presented code, here, only reads each byte once, so stability is not an issue. See my answer for an in depth analysis of the code as it is presented, here. – Jens Gustedt Sep 12 '14 at 07:52
  • @JensGustedt my answer is addressing `char buffer[10]; buffer[9] = '\0'; strlen(buffer); `, where `strlen` refers to the standard library function `strlen`. I thought my first sentence made that clear – M.M Sep 12 '14 at 07:58
  • Then you are not addressing the question that is asked in detail. And even then, citing the answer to the question "3" of the defect report without citing that same question "3" is misleading. It gives an answer if this "can" lead to UB, not if it does in all cases. – Jens Gustedt Sep 12 '14 at 07:59
  • @JensGustedt The question asked is "Is it undefined behaviour?" Defining your own function called `strlen` causes undefined behaviour, so it is UB by either interpretation. But my interpretation of OP's text is that he is asking about the behaviour of the standard function `strlen`. He posted his own pseudo-code as rationale for why he asked the question: he suspects that the standard function `strlen` might access the indeterminate values in order to find the length of the string. – M.M Sep 12 '14 at 08:02
2

The behavior of the variant that you are showing is well defined under these circumstances.

  • The bytes of the uninitialized array have all indeterminate values, with exception of the 10th element that you set to 0.
  • Accessing an indeterminate value would only be UB if the address of the underlying object would be never taken or if the value is a trap for the corresponding type.
  • Since this is an array and access to array elements is through pointer arithmetic, the first case is not relevant, here.
  • Any char value can be accessed without UB, the clauses about trap representations in the standard explicitly exclude all character types from that.
  • Thus the values that you are dealing with are simply "unspecified".
  • Reading unspecified values may according to some members of the C standards committee give different results each time, what some call a "whobly" state or so. This property is not relevant, here, since your function reads any such value at most once.
  • So your access to the array elements gives you any arbitrary but valid char value.
  • You are sure that your for loop stops at latest at position 9, so you will not overrun your array.

So no "bad" things beyond the visible may happen if you use your specific version of the function. But having a function call that produces unspecified results is certainly nothing you want to see in real code. Something like this here leads to very subtle bugs, and you should avoid it by all means.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
1

No, it's not undefined behavior. Your strlen function will stop before the end of the buffer. If your strlen function referenced buffer[10], then, yes that is undefined.

It certainly will be unexpected behavior, since most of buffer contains random data. "Undefined" is special word for people writing language standards. It means that anything could happen, including memory faults or exiting the program. By unexpected, I mean that it sure not what the programmer wanted to happen. On some runs, the result of strlen could be 3 or it could be 10.

KC-NH
  • 748
  • 3
  • 6
0

Yes, it's undefined behaviour. From the draft C11 standard, §J.2 "Undefined behavior":

The behavior is undefined in the following circumstances:

...

The value of an object with automatic storage duration is used while it is indeterminate.

Community
  • 1
  • 1
nneonneo
  • 171,345
  • 36
  • 312
  • 383
  • 2
    This code doesn't actually use the indeterminate values (`buffer` is not indeterminate, but `buffer[0]` is). However, `strlen` uses the values. Also, this annex is non-normative (it's supposed to be a sort of index to find various cases of UB). The normative text is more detailed and has some exceptions for when indeterminate use is not UB. – M.M Sep 12 '14 at 01:52
  • 1
    The object is not only "indeterminate" but the values are just "unspecified", so nothing bad can happen. – Jens Gustedt Sep 12 '14 at 07:54