4

This was a quiz (not graded) on Coursera. The question was, what does the following code possibly evaluate to? The correct answers were 127 and 0 (other options were crash, -1, 128. Why does the following code possibly evaluate to 0? I understand why it would evaluate to 127. Is it just as simple as the char bytes are uninitialized and therefore random? Can it also possibly evaluate to any # between 0 and 127?

int foo(void) {

    char bar[128];

    char *baz = &bar[0];

    baz[127] = 0;

    return strlen(baz);

}
Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
Nathan Fowler
  • 561
  • 5
  • 19

3 Answers3

6

Previously this answer had wrong information, this case does not invoke undefined behavior.


Edited answer:

TL;DR We cannot have a definitive answer, the code contains indeterministic behavior.

To elaborate, char bar[128]; is an automatic local variable and if not initialized explicitly, will contain indeterminate values.

Quoting C11, chapter §6.7.9

If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. [....]

In your code, you have assigned value for only one member of the array, at index 127. Remaining elements still have indeterminate value.

Attempt to pass that array (pointer to the first element of the array, basically) to strlen(), causes a read on those values (in search of a null-terminator) and due to the indeterminate values, there's no guarantee that it will find the null-terminator at any particular location.

  • It can very well find a null terminator (ASCII value 0) in the very first element and return 0.
  • It can also not find any null terminator (ASCII value 0) in any of the other array elements until the last one and return 127.
  • It can find a null terminator anywhere in the array and return that count.

So, there's no definite answer for this question.


Note: (to make up for my wrong understanding to prevent readers from falling into the same trap further)

Here, reading the uninitialized values (i.e., indeterminate values) does not invoke undefined behaviour, as one may think.

The one liner: The address is taken for the object.

There's a detailed discussion on this topic, refer here.

too honest for this site
  • 12,050
  • 4
  • 30
  • 52
Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
  • 4
    The char array will contain indeterminate values between -128 and 127. Reading them is not UB. The value returned will be indeterminate between 0 and 127 as it will return the position of the first \0 encountered. As baz[127] = 0, no UB. – neuro Jan 12 '17 at 13:11
  • @neuro (_or the upvoters to above comment_) how is so? I am looking at Annex J.2 _"The value of an object with automatic storage duration is used while it is indeterminate"_. What did I miss? – Sourav Ghosh Jan 12 '17 at 13:14
  • The array address is perfectly determined. so baz is perfectly good so giving it to strlen is not UB. The content of bar in indeterminate, but that's something else ... – neuro Jan 12 '17 at 13:19
  • @neuro OK, but is not it like your latest comment conflicts with the earlier one which said "Reading them is not UB."? – Sourav Ghosh Jan 12 '17 at 13:20
  • `char bar[128] ` allocate 128 bytes on the stack at a perfectly good address. baz = &bar[0] get this address. baz is a perfectly valid address. So giving it to strlen is not UB ? – neuro Jan 12 '17 at 13:22
  • 1
    @neuro You're correct that the problem here are indeterminate values. However reading them may be UB. Saying that something that may be UB, is not UB, is wrong. It is best to consider intederminate values as UB. An answer could only say "no-UB" only under an explicit assumption that the type must not have a trap value in the implementation. – 2501 Jan 12 '17 at 13:22
  • @2501 I am wondering a little, is the version you mentioned in the comment is unclear in my answer? Any alternative wordings? – Sourav Ghosh Jan 12 '17 at 13:23
  • @Sourav: a byte can not have an invalid value. Every possible value is OK. So no UB. You would be right for an array of pointers. You can not have UB reading bytes here any random byte value is ok. As bar[127] is set to 0 you can not go over the allocated space. So no UB ;) – neuro Jan 12 '17 at 13:29
  • @neuro Type char may have trap representations. – 2501 Jan 12 '17 at 13:31
  • @2501: Well those trap representations does not exists in C89 (a long time since my last real C sofware). By the way, unless you store a char in something else than a real byte there's no room for a trap representation like NaN. It seems that will be corrected in C1x. But I'm no expert at Cxy standard. It seems I'm right in C89 not in C99 and probably in C1x. http://stackoverflow.com/questions/6725809/trap-representation – neuro Jan 12 '17 at 13:40
  • @2501: Hum, thinking on it you can use '-0' as a trap value for char. Even if I doubt anyone has coded strlen to fail reading any byte ;) https://groups.google.com/forum/#!topic/comp.std.c/rRfT9fa6ga8 – neuro Jan 12 '17 at 13:49
  • 1) Annex J is informative, not normative, which would be required as proof. 2) See the discussion at the question; there is an explicit exception for lvalues of character type. 3) Maybe this is a defect in the Annex? – too honest for this site Jan 12 '17 at 13:56
  • 1
    @neuro: There is no C1x. And C standard is C11, nothing else. It is a long time since C89 and even C99. Without further notes/tags, we assume the current standard for a question. And also the standard very well allows for trap-representations, but there is an exception for reading a character. Not sure how one would comply to both or what use that made, but from the standard's wording it can have a trap representation. – too honest for this site Jan 12 '17 at 13:58
  • @neuro: Err - no! The answer is not "it is undefined". – too honest for this site Jan 12 '17 at 15:21
  • @neuro thanks but im just wondering regarding the logic behind those DVs. If there's something inherently wrong, we should get a chance to improve it. – Sourav Ghosh Jan 12 '17 at 15:24
  • 1
    @Olaf Sorry for the delay sir, but as I know, signed `char` can have trap representation, so there's that. What makes this non-UB? – Sourav Ghosh Jan 12 '17 at 16:52
  • @SouravGhosh: 6.2.6.1p5 for the general case and 7.24.1p3 together with the fact that `unsigned char` cannot have a trap representation. I agree it is a bad idea, but as it looks, the standard has either an intended exception (one reason might be `memcpy` for `struct`s and `union`s), or this is another inconsistency. For me I'd love to see the legacies gone at the cost of making the language incompatible with old versions (as they did for VLAs already). But I'm not positive about that. – too honest for this site Jan 12 '17 at 17:30
  • @Olaf right, but we cannot conclude that `char` == `unsigned char`, right? It's implementation dependent. So in this particular case, can we conclude for sure that this is not UB? – Sourav Ghosh Jan 12 '17 at 17:57
  • @SouravGhosh: Please read the definition of `strlen`. That's why I wrote "together with ...". – too honest for this site Jan 12 '17 at 17:58
  • Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackoverflow.com/rooms/133081/discussion-on-answer-by-sourav-ghosh-c-char-pointer-length). – Bhargav Rao Jan 13 '17 at 12:45
  • Annex J is misleading and indeed not normative. The code in the OP's answer does take the address of the variable, so the special case of UB mentioned in 6.3.2.1/2 does not apply. You may find [this discussion](http://stackoverflow.com/questions/40584969/reading-an-indeterminate-value-invokes-ub) of interest. – Lundin Jan 19 '17 at 07:45
  • @Lundin Thanks for showing me the pointers, I've corrected my understanding and updated my answer accordingly, may I please ask for a review now? – Sourav Ghosh Jan 20 '17 at 17:50
  • @SouravGhosh: I removed the DV. Looks ok for me now. – too honest for this site Jan 20 '17 at 18:50
  • "indeterministic behaviour" is not a thing, according to the standard. The options are: well-defined, unspecified, implementation-defined and undefined. – M.M Jun 03 '19 at 10:46
  • @toohonestforthissite: `bar` is an array of `char`, which, unlike `unsigned char`, may have a trap value I believe, although it cannot have padding bits. Modern systems using two's complement representation do not have trap values for type `char`, but a strict reading of the C Standard allows for undefined behavior here, not just indeterminate behavior, as `strlen()` is not specified as reading the string using type `unsigned char`. – chqrlie Jun 03 '19 at 10:54
  • Sorry for the edit-mess. Seems I should start slowly :-) – too honest for this site Mar 06 '21 at 18:31
5

The behaviour of the code is indeterminate. By this I mean that the answer could be anything between 0 and 127 inclusive.

strlen will be reading uninitialised memory up to but not including bar[127], which will act as a termination condition.

But because that array consists of char elements, reading those data is not undefined since char types cannot have a trap representation. It simply that they contain indeterminate values.

(It would have been an entirely different matter had bar had static storage duration. Then the answer would always be zero).


The bulk of the comments below are reacting to an incorrect formulation of this answer that stated that the behaviour was undefined.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • 4
    I'm not sure it's undefined. There is no access beyond the buffer limits. It just reads indeterminate values. – StoryTeller - Unslander Monica Jan 12 '17 at 13:05
  • See http://stackoverflow.com/questions/11962457/why-is-using-an-uninitialized-variable-undefined-behavior-in-c – Bathsheba Jan 12 '17 at 13:06
  • And the accepted answer makes very good points. None of which apply here. `char` has no trap values, as far as I'm aware, and a buffer won't be stored in a register. – StoryTeller - Unslander Monica Jan 12 '17 at 13:08
  • @StoryTeller Type char may have trap values. – 2501 Jan 12 '17 at 13:20
  • 2
    @StoryTeller I'd say you have to provide a citation in the standard that precludes `char` having a trap value. – Andrew Henle Jan 12 '17 at 13:27
  • @StoryTeller Standard clearly states that any type may have a trap representation. There are certain exceptions, and type char is not among them. – 2501 Jan 12 '17 at 13:28
  • @2501 I seem to recall char being amongst them. Or am I confusing it for unsigned char? – StoryTeller - Unslander Monica Jan 12 '17 at 13:30
  • 1
    OK, 6.2.6.1, paragraph 5 seems relevant: *Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. Such a representation is called a trap representation.* "Character type" is mentioned. – Andrew Henle Jan 12 '17 at 13:32
  • That's also noted here: http://stackoverflow.com/a/11962468/4756299 We're definitely in "language lawyer" territory here. – Andrew Henle Jan 12 '17 at 13:33
  • @AndrewHenle - Right. To my understanding §6.2.6.1 ¶5 contradicts the possibility of a trap value for all character types. Correct me if I'm wrong. – StoryTeller - Unslander Monica Jan 12 '17 at 13:41
  • 1
    @StoryTeller: It does not contradict, but it makes reading via a character type (i.e. any member of the `char` family) not UB. I suspect this is (among other reasons) for `memcpy` to copy a `struct` as block even if there are padding bytes with indeterminate value. – too honest for this site Jan 12 '17 at 13:43
  • @Olaf - It would seem reading an indeterminate value via a character type is not UB. So how is reading several in sequence (which is what the question is about) can be considered UB? – StoryTeller - Unslander Monica Jan 12 '17 at 13:44
  • @StoryTeller: Where did I say or imply that? You might be right _techically_ this make trap representations of `char`s impossible, but that's not exactly what the standard says. – too honest for this site Jan 12 '17 at 13:46
  • @StoryTeller That rule only addressed assignments to a character type. Type char may be used in expressions, where it will happily trap. – 2501 Jan 12 '17 at 13:49
  • @Olaf - I didn't say you implied that. The entire preceding discussion is about whether or not the call to strlen will result in UB. I claimed the result is unspecified, not undefined. Trying to make heads or tails out of it. – StoryTeller - Unslander Monica Jan 12 '17 at 13:50
  • @2501 - So reading and writing the values is fine. Comparing to `'\0'`, which is what `strlen` is bound to do, is UB!? Talk about fine print... – StoryTeller - Unslander Monica Jan 12 '17 at 13:58
  • @StoryTeller: Sorry then I missunderstood your comment as correction of mine, while it seems now to be more of a question. To be clear: I agree with you this is not UB. But I'm not quite sure how a trap-representation should be processed in `strlen`. I'd suppose it has to compare not equal. But then the last entry **is** determined, and always compares equal. – too honest for this site Jan 12 '17 at 14:00
  • @2501: The paragraph clearly states **reading**, not writing/assigning to. There might be UB, though for the coercion of such a character to `int` for the comparison operator. Which would make it depending on the implementation of `strlen` (this could use assembly language, thus avoiding UB). – too honest for this site Jan 12 '17 at 14:08
  • @olaf: interesting ... Is it normative ? – neuro Jan 12 '17 at 14:39
  • @neuro: As the link is the whole standard you can check yourself. – too honest for this site Jan 12 '17 at 15:14
  • This answer is indeed incorrect. The local variable has its address taken and character types cannot hold trap representation, as mentioned in the very definition of a trap representation (6.2.6.1/5). So this code can _never_ invoke UB on any given system. For more info see [this](http://stackoverflow.com/a/40674888/584518). – Lundin Jan 19 '17 at 07:55
  • @Lundin: You are, of course, correct. Funny how essentially this one came up yesterday too. I've amended this answer. – Bathsheba Jan 19 '17 at 08:04
  • There is no such thing as "indeterminate" behaviour. The possibilities are: well-defined, unspecified, implementation-defined, and undefined. – M.M Jun 03 '19 at 10:42
1

There are two things that could make this code UB, as listed here. This is a variable with automatic storage duration that has its address taken, so the first case definitely does not apply.

The variable is not allowed to hold a trap representation either, as per the definition of trap representations C11 6.2.6.1/5 emphasis mine:

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.

This means that the array holds unspecified values. One case of such an unspecified value could be the value 0, at any place in the array, getting treated as a null terminator.

Community
  • 1
  • 1
Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 1
    @Bathsheba Well, the truth about this has to be found rather deep down in the language-lawyer swamp. And C++ is different. – Lundin Jan 19 '17 at 08:07