0

I've been messing around with c today and dont understand the difference in outputs when I comment out the third buffer in this code:

 #include <unistd.h>
 #include <string.h>
 #include <stdio.h>
 void main() {
     unsigned char letters[10];
    memset(letters, 0x00, 10);
    memset(letters, 0x41, 10);
    printf(letters);
    printf(" Total buffer len: %d bytes\n",strlen(letters));

     char nletters[10];
    memset(nletters, 0x00, 10);
    memset(nletters, 0x42, 10);
     printf(nletters);
    printf(" Total buffer len: %d bytes\n",strlen(nletters));

     int nums[10];
     memset(nums, 0x00, 10);
    memset(nums, 0x43, 10);
    printf(nums);
    printf(" Total buffer len: %d bytes\n",strlen(nums));   
 return 0;
}

The difference is with comments removed around the nums buffer:

AAAAAAAAAA�7ǝ�U Total buffer len: 16 bytes
BBBBBBBBBBAAAAAAAAAA�7ǝ�U Total buffer len: 26 bytes

And with the buffer left in:

AAAAAAAAAA Total buffer len: 10 bytes
BBBBBBBBBBAAAAAAAAAA Total buffer len: 20 bytes
CCCCCCCCCC��U Total buffer len: 14 bytes

What I dont get is:

  1. How, for the love of all that is holy, can commenting out the third buffer affect the size of the others?

  2. What are the extra bytes at the end of the buffers and how can I lose/manage them (if I choose to concatenate the buffers)?

  3. Why are the differences in the printed buffer size and initialized size not consistent when I choose whether to comment the third buffer?

  4. Buffer 2 is supposed to be 10 bytes, why is it 20? I don't want it to be 20, I only asked for 10. I don't think that's unreasonable.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
S.And
  • 17
  • I will. Thanks for the pointer (pun!!!!!) –  Jul 14 '18 at 19:36
  • 1
    Welcome to Software Engineering. We only support [good](https://softwareengineering.stackexchange.com/help/how-to-ask), [on-topic](https://softwareengineering.stackexchange.com/help/on-topic) questions. [Many sites](http://stackexchange.com/sites) have [different rules](https://softwareengineering.meta.stackexchange.com/a/8067). Feel free to take your issue to an appropriate site if one exists. Search existing answers first. Edit your question to fit the sites needs. Please [don't cross post](https://meta.stackexchange.com/tags/cross-posting/info) by failing to delete your question here. – candied_orange Jul 14 '18 at 19:49
  • 2
    You're missing the notion of null termination. C strings are supposed to have a null character to indicate the end of the string. Without the null terminating character at the end of the string, there is no predicting the behavior of the program, since the `strlen` is accessing memory beyond the array being passed. As far as C is concerned, providing a null terminator is the programmer's responsibility. Now, let me ask you: how many bytes of actual char/unsigned char storage does it take to store 10 real/user characters if you also need a null terminating byte at the end? – Erik Eidt Jul 14 '18 at 20:21
  • 2
    Also, consider the difference between `sizeof(letters)` and `strlen(letters)`. The former is a compile-time constant that goes to the *actual declaration* of `letters` (and gives just raw size that doesn't consider or account for a null terminating character), whereas the latter is a runtime search for the null terminating character to find the length of the actual string (without that null character) stored in a (presumably) large enough space for it. – Erik Eidt Jul 14 '18 at 20:27
  • 1
    when compiling, always enable the warnings, then fix those warnings, ( for `gcc`, at a minimum use: `-Wall -Wextra -Wconversion -pedantic -std=gnu11` ) Amongst other things, (regardless of what visual studio might allow) the function`main()` always has a return type of `int`, not `void`. Why is the `main()` being declared as returning `void` but contains the statement: `return 0;`? – user3629249 Jul 15 '18 at 15:27
  • 1. [`void main()` is wrong](https://stackoverflow.com/q/204476/995714) except on freestanding environment. 2. `strlen` returns `size_t` which [must be printed using `%zu`](https://stackoverflow.com/q/940087/995714) – phuclv Jul 15 '18 at 16:31
  • 1
    this is a duplicate: this user posted the same exact question on several SO sites with a slightly different username – Jean-François Fabre Jul 15 '18 at 20:02

3 Answers3

2
  1. When the stack frame is constructed, some auxiliary data is pushed onto the stack, and it may or may not contain zeroes, which ultimately mark the end of your overflown buffers
  2. C-strings end with a zero marker. Your first two buffers do not end with a zero, but the CPU is dedicated and reads the memory until it actually finds a zero. Usually there will be one somewhere...
  3. Uninitialized buffers contain whatever data is left in the memory from previous usages
  4. The stack grows 'downwards' in the memory, so the first buffer ends up at address "50", the second one at address "40", and the third one is at address "0". But, when you print the second buffer starting from address "40", it reads the memory upwards, so 10 B-s, the 10 A-s, and some bytes until it finds a zero.

Readings: null-terminated string, buffer overflow, stack things.

C-strings need an extra \0 character at the end of a string, so your 'letters' and 'nletters' could store actual strings of 9-letters, plus the zero terminator (which is not there in the memory by default, you have to put it there yourself). 'nums' is an integer array, it is not really suited for storing strings, but C/C++ will not stop you from doing that. That is why I wrote "40" above as hypothrtical address of the second buffer: 'nums' is most likely a 4x10-byte buffer with 32-bit integers.

tevemadar
  • 12,389
  • 3
  • 21
  • 49
0

the following proposed code corrects many (most) of the problems in the OPs posted code.

Note the proper declaration of the main() function signature

Note the consistent use of indention of the code

Note the use of appropriate horizontal spacing for readability

Note the use of a proper format parameter in each of the calls to printf()

Note the use of sizeof to return the size of the buffer (per what the printf statements claim they are showing

Note that both sizeof and strlen() return a size_t, not an int

Note the elimination of the magic numbers (like 10)

Note the elimination of a header file those contents are not used

and now, the proposed code:

#include <string.h>
#include <stdio.h>

#define MAX_LEN 10



int main( void )
{
    unsigned char letters[ MAX_LEN ];
    memset( letters, 0x00, sizeof( letters ) );
    memset( letters, 0x41, sizeof( letters )-1 );  // keep NUL byte at end
    printf( "%s\n", letters );   //format the output,
                                 // use \n so immediately output to terminal
    printf( " Total buffer len: %lu bytes\n", sizeof(letters) );

    char nletters[ MAX_LEN ];
    memset( nletters, 0x00, sizeof( nletters ) );
    memset( nletters, 0x42, sizeof( nletters )-1 );  // keep NUL byte at end
    printf( "%s\n", nletters );   // format the output,
                                  // use \n so immediately output to terminal
    printf( " Total buffer len: %lu bytes\n", sizeof(nletters) );

    int nums[ MAX_LEN ];            // 10 integers
    memset( nums, 0x00, 10* sizeof( int ) );
    memset( nums, 0x43, 9 );  // this only sets first 10 bytes
                              // NOTE:  sizeof( int ) not same as size of char
                              //   so most of array not modified
    for( size_t i=0; i< MAX_LEN; i++ )
    {
        printf( "%d\n", nums[ i ] );
    }
    printf( " Total buffer len: %lu bytes\n", sizeof(nums) );
    return 0;
}

running the above code results in the following output:

AAAAAAAAA
Total buffer len: 10 bytes
BBBBBBBBB
Total buffer len: 10 bytes
1128481603
1128481603
67
0
0
0
0
0
0
0
Total buffer len: 40 bytes
user3629249
  • 16,402
  • 1
  • 16
  • 17
0

You say you've been messing with C, but this is not C. This breaks some rules of C. If you break rules of C, strange things happen... My main question to you is: which book are you reading? Because your current one is not working well for you...


In C, strings terminate with a '\0'. As letters is not a sequence of characters that contains a '\0' character, it isn't a string, and so thus you shouldn't be treating it like one. If you want ten characters in a string, you actually need an array of eleven (at least) to make way for the '\0' (which you also need to do manually, after memset).

char letters[11];
memset(letters, 'a', 10);
letters[10] = '\0';

In C, the %zu format specifier is used to print size_t values such as that returned from strlen. %d is for printing int values, only.

printf("%s\n", letters);
printf("strlen(letters): %zu\n", strlen(letters));

How, for the love of all that is holy, can commenting out the third buffer affect the size of the others?

printf and strlen expect their arguments to be strings, however a string must always contain a '\0'. Your arrays don't contain '\0', so the string-related functions loop out of bounds and process data that's out of bounds as a result.

As an exercise, predict strlen((char[]) { 1, 2, 3, 4, '\0', 5 })... Test your theory.


What are the extra bytes at the end of the buffers and how can I lose/manage them (if I choose to concatenate the buffers)?

Those extra bytes are undefined behaviour, which is a scary set of words meaning "anything could happen in place of that, because you broke the rules"... When you break the rules in C, strange things happen...


Why are the differences in the printed buffer size and initialized size not consistent when I choose whether to comment the third buffer?

Again, undefined behaviour, breaking the rules... strange things happen... and which book are you reading? The reason I ask, it seems like anybody using a book would have navigated past this issue quickly, so I think you're just guessing (which is dangerous in C). You'll learn C (as in properly) much faster by reading a book.


Buffer 2 is supposed to be 10 bytes, why is it 20? I don't want it to be 20, I only asked for 10. I don't think that's unreasonable.

Stop telling C there's a string (a sequence of characters leading up to a '\0') when there isn't one...

autistic
  • 1
  • 3
  • 35
  • 80