0

I've been messing around with C today and don't understand the difference in outputs when I comment out the third buffer in this code:

 #include <unistd.h>
 #include <string.h>
 #include <stdio.h>
 void main() {
     unsigned char letters[10];
    memset(letters, 0x00, 10);
    memset(letters, 0x41, 10);
    printf(letters);
    printf(" Total buffer len: %d bytes\n",strlen(letters));

     char nletters[10];
    memset(nletters, 0x00, 10);
    memset(nletters, 0x42, 10);
     printf(nletters);
    printf(" Total buffer len: %d bytes\n",strlen(nletters));

     int nums[10];
     memset(nums, 0x00, 10);
    memset(nums, 0x43, 10);
    printf(nums);
    printf(" Total buffer len: %d bytes\n",strlen(nums));   
 return 0;
}

The difference is with comments removed around the nums buffer:

AAAAAAAAAA�7ǝ�U Total buffer len: 16 bytes
BBBBBBBBBBAAAAAAAAAA�7ǝ�U Total buffer len: 26 bytes

And with the buffer left in:

AAAAAAAAAA Total buffer len: 10 bytes
BBBBBBBBBBAAAAAAAAAA Total buffer len: 20 bytes
CCCCCCCCCC��U Total buffer len: 14 bytes

What I don't get is:

  1. How can commenting out the third buffer affect the size of the others?

  2. What are the extra bytes at the end of the buffers and how can I lose/manage them (if I choose to concatenate the buffers)?

  3. Why are the differences in the printed buffer size and initialized size not consistent when I choose whether to comment the third buffer?

  4. Buffer 2 is supposed to be 10 bytes, why is it 20? I don't want it to be 20, I only asked for 10. I don't think that's unreasonable.

Jongware
  • 22,200
  • 8
  • 54
  • 100
S .Sand
  • 1
  • 1
  • 3
    You've just had a major run-in with the bête noir of C programmers — _undefined behaviour_. You're not working with null-terminated strings, so when you treat them as null terminated strings, you get weird effects. Your `printf(nums)` should have yielded shrieks of protest from your compiler — heed it; it was trying to help you. (If you're using ``, using `void main()` is wrong too. See [What should `main()` return in C and C++](https://stackoverflow.com/questions/204476/) for the sordid details.) – Jonathan Leffler Jul 14 '18 at 19:57

2 Answers2

2

char strings in C are really called null-terminated byte strings. That null-terminated bit is important, and all string functions look for it to know when the string ends.

If you pass an unterminated string to a string function, it will go out of bounds and that will lead to undefined behavior.

The terminator is equal to zero, either integer 0 or the character '\0'.

And of course this null-terminator character needs space in your string. That means a string of 10 character must have space for 11 to fit the terminator.

The simple first would look something like

char letters[11] = { 0 };  // Space for ten character plus terminator
// The above definition also initializes all elements in the array to zero,
// which is the terminator character

memset(letters, 'A', 10);  // Set the ten first characters to the letter 'A'

printf("%s", letters);  // Never print a data string directly using printf's first parameter.

printf(" Total buffer len: %d bytes\n", strlen(letters));

Note the change to printf. This is because if you ever get the string input from a user, passing it directly as the format string to printf is an incredibly bad security hole. If the string contains formatting code but there are no arguments, that would lead to undefined behavior.

Also note that I changed the magic number 0x41 to the ASCII character it corresponds to. Magic numbers is a bad habit that makes code harder to read, understand and maintain.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31
Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
0

Try this:

memset(letters, 0x00, 10);
memset(letters, 0x41, 9);   /* <--- see the array size minus one there */

that will make the printf(3) to work properly, but printing a list of nine As only. As explained in other responses, this has to do with the nightmare for C programmers to null terminate strings built by hand. For that reason is more common to use the <string.h> functions.

In another place, using printf()'s first parameter without a string literal is discouraged, as in the case your string had a % character, that would be interpreted as a format descriptor, and you would had run into more Undefined behaviour, again.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31