3

I am making it clear that my question is exact duplicate of this question.

But unfortunately I have one question which any of the answers didn't addressed. So the code was:-

#include <string.h>

int foo(void) {
  char bar[128];
  char *baz = &bar[0];
  baz[127] = 0;
  return strlen(baz);
}

Question was: What are the possible outputs of this function?

When I run this code, this gives 0 everytime and the correct answers are 0 and 127(I still didn't get why?).

My question is how this statement is even valid I mean we are calculating the length of baz which contains a memory address say 0xb96eb740 which is a hex number, so what we are doing is strlen() on this address to find it's length? I mean how can we find length of an address, which is just a number?

I am really confused and trying to understand it for hours but still not getting it.

daya
  • 160
  • 2
  • 14

4 Answers4

9

Don't get stuck on the fact that it's being passed an address. strlen() always takes an address. It's argument is a const char *, the address of a string. All of these calls pass the exact same address:

strlen(baz);
strlen(&bar[0]);
strlen(bar);

baz is assigned &bar[0], so the first and second are equivalent. An array decays to a pointer to its first element (array == &array[0]), so the second and third are equivalent.

I mean how can we find length of an address, which is just a number?

Let's say that bar == &bar[0] == baz == (char *) 0xb96eb740 as per your example. strlen() will first check if memory location 0xb96eb740 contains \0. If not, it will then check 0xb96eb741. Then 0xb96eb742. Then 0xb96eb743. It will continue checking each location sequentially until it finds \0.

I know that's true. But why does strlen(baz) return 0?

As the linked Q&A explains, the behavior is indeterminate because the contents of the bar[128] array are uninitialized. There could be anything in that array. The only cell we know the value of is bar[127], which is set to \0. All the others are uninitialized.

That means that any one of them, or all of them, or none of them, could contain a \0 character. It could change from run to run, from call to call even. Every time you call foo() you could get a different result. That's entirely possible. The result will vary based on what data happens to be on the stack before foo() is called.

When I run this code, this gives 0 every time and the correct answers are 0 and 127 (I still don't get why?).

It could return any value between 0 and 127. Due to the indeterminate behavior you mustn't read too much into what the program happens to return when you run it. The output could be different if you run the program again, if you cal a different set of functions before foo(), if you run a different program beforehand, if you change compilers, if you run it a different day of the week, if you use a different operating system, etc., etc., etc.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
2

My question is how this statement is even valid I mean we are calculating the length of baz which contains a memory address say 0xb96eb740 which is a hex number, so what we are doing is strlen() on this address to find it's length?

The strlen function accepts an address as argument, and its behaviour is to read the character stored at that address. (It does not try to read the characters of the address as you seem to be suggesting). If that character is not '\0' then it will read the character at the next address and see if that is '\0' etc.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • 1
    The [linked answers](https://stackoverflow.com/questions/41613966/c-char-pointer-length) explain that it does *not* invoke undefined behavior, merely indeterminate behavior. – John Kugelman Jun 03 '19 at 10:37
  • @M.M Do you have some source for the claim that this is UB? – klutt Jun 03 '19 at 10:51
  • 2
    @Broman Yes, the committee resolution to DR 451. However on reviewing OP's question I think he is specifically only asking about how `strlen(ptr)` works , so the topic of behaviour of the code could be confined to the other thread (since answering it to language lawyer standard here will swamp the answer to OP's specific question) – M.M Jun 03 '19 at 10:54
  • I completely understood how `strlen(baz)` is working but still unsure about undefined and indeterminate thing. What I think is It is indeterminate behaviour because we _cannot_ predict _what_ will be the output here but there _will_ be certainly a output and undefined on the other hand is that output is simply undefined like if I do `2 / 0` – daya Jun 03 '19 at 12:28
2

The answer to your question is anything can happen.

The array bar is uninitialized. Only bar[127] is explicitly set to '\0'. Passing an uninitialized array to strlen(), which you do indirectly by passing baz, which points to bar[0], has undefined behavior.

In practice, on modern architectures without trap values, function foo() has unspecified behavior and can return any value between 0 and 127 depending on whatever the stack contains when you call it.

In your case it returns 0 because there happens to be a null byte at the beginning of bar, but you cannot rely on that and successive calls to foo() could return different values.

If you run a program that calls foo() under valgrind or some other memory sanitizing tool, it might complain that strlen() accesses uninitialized memory.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • The [linked answers](https://stackoverflow.com/questions/41613966/c-char-pointer-length) explain that it does *not* invoke undefined behavior, merely indeterminate behavior. – John Kugelman Jun 03 '19 at 10:39
  • @JohnKugelman: `bar` is an array of `char`, which, unlike `unsigned char`, may have trap values I believe, although it cannot have padding bits. Modern systems do not have trap values as I mentioned, but a strict reading of the standard allows for undefined behavior here, not just indeterminate behavior. – chqrlie Jun 03 '19 at 10:47
  • Reading characters does not have undefined behavior. C 2018 6.2.6.1 5: “Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression **that does not have character type**, the behavior is undefined.…” – Eric Postpischil Jun 03 '19 at 12:18
  • @EricPostpischil: this phrase does not imply that reading via `char` type has defined behavior, it only applies to lvalue expressions that do not have character type. Paragraph 3 of 6.2.6.2 says `char` cannot have padding bits but leaves the possibility for negative 0 to be a trap representation even for `char` type, on non two's complement architectures. – chqrlie Jun 03 '19 at 14:58
  • Even reading through a character type can be UB if the object in question was not initialized and never had its address taken as per 6.3.2.1p2. – dbush Jun 03 '19 at 15:00
  • @chqrlie: Nothing else in the standard says that reading the value of a character can trap. This is the paragraph that enables traps. – Eric Postpischil Jun 03 '19 at 15:03
  • @dbush: Array elements automatically qualify for “had its address taken”; they are all accessed by calculating their addresses. Technically, the requirement in the C standard is “could have been declared with the `register` storage class”. You **can** declare an array `register`, but then the only defined behavior for it is using it in `sizeof`. In the sense used in the rule about uninitialized objects, 6.3.2.1 2, no, an array cannot have been declared `register`. If you use a pointer or array to access an object, 6.3.2.1 2 does not apply, since its address has been taken. – Eric Postpischil Jun 03 '19 at 15:07
1

Others have covered that the value is indeterminate, so I go directly to this:

I mean how can we find length of an address, which is just a number?

You don't. The length of a string is calculated by reading the memory sequentially from the address you want to start with and see how far you need to go before you hit the first '\0' character. Here is an example of how you can implement a function that returns the length of a string:

int strlen(char * str) {
    int length=0;
    while(str[length] != '\0') 
        length++;
    return length;
}
klutt
  • 30,332
  • 17
  • 55
  • 95
  • @chqrlie Yes I know. I simplified it on purpose. In a typical implementation `str` should also be of type `const char *`. That's another thing I removed just to show the concept. – klutt Jun 03 '19 at 12:14
  • @chqrlie You have a point. However, it does not feel like a simplified version either since the algorithm is the same and has the same functionality. (Except for strings with length > INT_MAX) – klutt Jun 03 '19 at 14:49
  • @chqrlie Since it to some extent is more complex, that's the wrong word. But I reworded the whole sentence now. – klutt Jun 03 '19 at 14:51