1

This might be a bit long question. I was testing some character arrays in C and so came along this code.

char t[10];
strcpy(t, "abcd");
printf("%d\n", strlen(&t[5]));
printf("Length: %d\n", strlen(t));

Now apparently strlen(&t[5]) yields 3 while strlen(t) returns 4.

I know that string length is 4, this is obvious from inserting four characters. But why does strlen(&t[5]) return 3?

My guess is that

String:   a | b | c | d | 0 | 0 | 0 | 0 | 0 | \0
Position: 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 

strlen(&t[5]) looks at the length of a string composed of positions 6, 7 and 8 (because the 10th character is a NULL terminating character, right)?

OK, then I did some experimentation and modified a code a bit.

char t[10];
strcpy(t, "abcdefghij");
printf("%d\n", strlen(&t[5]));
printf("Length: %d\n", strlen(t));

Now this time strlen(&t[5]) yields 5 while strlen(t) is 10, as expected. If I understand character arrays correctly, the state should now be

String:   a | b | c | d | e | f | g | h | i | j | '\0'
Position: 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10

so why does strlen(&t[5]) return 5 this time? I've declared a character array of length 10, should then, by the same logic applied above, the result be 4?

Also shouldn't I be running into some compiler errors since the NULL terminating character is actually in the 11th spot? I'm new into C and would very much appreciate anyone's help.

Pero Alex
  • 43
  • 1
  • 1
  • 6
  • `char t[10]` declares an array of exactly 10 `char`. *There is no 11th spot*, and if that array has automatic storage duration, as it appears yours does, then its initial contents are indeterminate. If you want to use that array as storage for a C string, then you need to explicitly provide sufficient capacity to accommodate a string terminator. – John Bollinger Jan 23 '18 at 14:11
  • Your question seems to distinguish the `char` values `0` and `'\0'`. However, they are the same. In particular, *if* the array were zero-initialized (but it isn’t), then the characters beyond `t[4]` would be zero (i.e. `0` or `'\0'`), not `'0'` (i.e. the character corresponding to the digit 0, which usually has the integer value 48, not 0). – Konrad Rudolph Jan 23 '18 at 14:38

3 Answers3

5

First let me tell you, your "assumption"

String:   a | b | c | d | 0 | 0 | 0 | 0 | 0 | \0
Position: 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 

is not correct. Based on your code, The values are only "guaranteed" up to index 4, not beyond that.

For the first case, in your code

  printf("%d\n", strlen(&t[5]));

is wrong for various reasons,

  • you ought to use %zu for a size_t type.
  • &t[5] does not point to a valid string.

Any (or both) of the above causes undefined behavior and any output cannot be justified.

To elaborate, with a defintion like

char t[10];
strcpy(t, "abcd");

you have index 0 to 3 populated for t, and index 4 holds the null-terminator. The content of t[5] onward, is indeterminate.

Thus, &t[5] is not a pointer to the first element of a string, so cannot be used argument to strlen().

  • It may run out of bound in search of the null-terminator and experience invalid memory access and, as a side-effect, produce a segmentation fault,
  • It may find a null-terminator (just another garbage value) within the bound and report a "seemingly" valid length.

Both are equally likely and unlikely, really. UB is UB, there's not justifying it.

Then, for the second case, where you say

char t[10];
strcpy(t, "abcdefghij");

is once again, accessing memory out of bound.

You have all together 10 array elements to store a string, so you can have 9 other char elements, plus one null-terminator (to qualify the char array as a string).

However, you're attempting to put 10 char elements, plus a null character (in strcpy()), so you're off-by-one, accessing out of bound memory, invoking UB.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
  • OK, that does explain some part of the question. But why does my code return 3 in the `strlen(&t[5])` example? I do understand that `&` is used in connection to pointers, but does it perhaps perform some other task here? – Pero Alex Jan 23 '18 at 14:09
  • 1
    I understand it now. Thank you for your time and energy! As it goes to everyone else, your answer did help a lot, thank you. – Pero Alex Jan 23 '18 at 14:17
2

char t[10]; is not initialized so it just contains garbage values 1). strcpy(t, "abcd"); overwrites the first 5 characters with the string "abcd" and a null terminator.

However, &t[5] points at the first character after the null termination, which remains garbage. If you invoke strlen from there, anything can happen, since the pointer passed is not likely pointing at a null terminated string.


1) Garbage = indeterminate values. Assuming a sane 2's complement system, the address of the buffer t is taken, so the code does not invoke undefined behavior until the point where strlen starts reading outside the bounds of the array t. Reference.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 1
    So 3 is essentially somewhat unpredictable result? – Pero Alex Jan 23 '18 at 14:12
  • Yes, @PeroAlex. That's more or less what "undefined behavior" means: the language does not define the results of executing such behavior. – John Bollinger Jan 23 '18 at 14:14
  • Now I get it. Thank you for your time and energy! It helped me a lot. – Pero Alex Jan 23 '18 at 14:16
  • What does two’s complement have got to do with anything here? – Konrad Rudolph Jan 23 '18 at 14:41
  • 1
    @KonradRudolph See the linked reference. If a variable with indeterminate value (that has its address taken) is accessed on a system without trap representations, then the result takes unspecified values. On a system with trap representations, accessing the variable may result in a trap. And 2's complement systems do not have trap representations for plain integers. – Lundin Jan 23 '18 at 15:11
1

Problem 1:

My guess is that

String:   a | b | c | d | 0 | 0 | 0 | 0 | 0 | \0
Position: 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 

This assumption is wrong. The array is not initialized to hold 0 values but contains some "random" garbage. After copying "abcd" the upper half of the array (t[5] etc.) is still untouched resulting in a "random" length of the string due to undefined behaviour.

Problem 2:

If I understand character arrays correctly, the state should now be

String:   a | b | c | d | e | f | g | h | i | j | '\0'
Position: 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10

Again wrong. Your array only holds 10 characters. Theyare at index 0..9. Index 10 is out of bounds. Your copy operation might result in this layout or it might as well just crash while writing out of bounds.

But this is not checked by the compiler. If you run into problems then it will be during runtime.

Gerhardh
  • 11,688
  • 4
  • 17
  • 39
  • `0` come from me testing what the undefined contents of my array were. I saw that they seem to be 0 all the time, thus I concluded that they must be zero. Your answer is actually very helpful, did clarify much. Thanks! – Pero Alex Jan 23 '18 at 14:15
  • Well, you didn't provide enough context to be precise. If `t[]` is defined on file scope it will be initialized to 0. If it is defined locally within a function, it is not. From your finding that `strlen(&t[5])`is not 0 I assumed that it must be a local one. – Gerhardh Jan 23 '18 at 14:45