2

I'm currently learning C programming and since I'm a python programmer, I'm not entirely sure about the inner workings of C. I just stumbled upon a really weird thing.

void test_realloc(){
  // So this is the original place allocated for my string
  char * curr_token = malloc(2*sizeof(char));

  // This is really weird because I only allocated 2x char size in bytes 
  strcpy(curr_token, "Davi");
  curr_token[4] = 'd';
  // I guess is somehow overwrote data outside the allocated memory?
  // I was hoping this would result in an exception ( I guess not? )      

  printf("Current token > %s\n", curr_token);

  // Looks like it's still printable, wtf???
  char *new_token = realloc(curr_token, 6);
  curr_token = new_token;
  printf("Current token > %s\n", curr_token);
}


int main(){
  test_realloc();
  return 0;
}

So the question is: how come I'm able to write more chars into a string than is its allocated size? I know I'm supposed to handle mallocated memory myself but does it mean there is no indication that something is wrong when I write outside the designated memory?

What I was trying to accomplish

  1. Allocate a 4 char ( + null char ) string where I would write 4 chars of my name
  2. Reallocate memory to acomodate the last character of my name
David Černý
  • 112
  • 2
  • 11
  • 6
    This is _undefined behavior_. – tkausl Apr 18 '17 at 12:11
  • Elaborate please? – David Černý Apr 18 '17 at 12:11
  • [Undefined Behavior](https://en.wikipedia.org/wiki/Undefined_behavior) everything can happen..... – LPs Apr 18 '17 at 12:12
  • Here's an [informative list of undefined behaviour](http://port70.net/~nsz/c/c11/n1570.html#J.2). It contains some UB, not all, and some it contains may be incorrect (just informative, not normative). For example " A string or wide string utility function is instructed to access an array beyond the end of an object (7.24.1, 7.29.4). " – Ilja Everilä Apr 18 '17 at 12:30
  • 2 * 1 = 2 .... sizeof(char) – Michi Apr 18 '17 at 12:50
  • Just as a general note, *always* check that the result of `malloc/calloc/realloc` isn't `NULL` before attempting to use the pointer, or you could potentially invoke *different* undefined behavior. – John Bode Apr 18 '17 at 13:04
  • Possible duplicate of [I can use more memory than how much I've allocated with malloc(), why?](http://stackoverflow.com/questions/3509714/i-can-use-more-memory-than-how-much-ive-allocated-with-malloc-why) – davmac Apr 19 '17 at 13:31

4 Answers4

7

know I'm supposed to handle mallocated memory myself but does it mean there is no indication that something is wrong when I write outside the designated memory?

Welcome to C programming :). In general, this is correct: you can do something wrong and receive no immediate feedback that was the case. In some cases, indeed, you can do something wrong and never see a problem at runtime. In other cases, however, you'll see crashes or other behaviour that doesn't make sense to you.

The key term is undefined behavior. This is a concept that you should become familiar with if you continue programming in C. It means just like it sounds: if your program violates certain rules, the behaviour is undefined - it might do what you want, it might crash, it might do something different. Even worse, it might do what you want most of the time, but just occasionally do something different.

It is this mechanism which allows C programs to be fast - since they don't at runtime do a lot of the checks that you may be used to from Python - but it also makes C dangerous. It's easy to write incorrect code and be unaware of it; then later make a subtle change elsewhere, or use a different compiler or operating system, and the code will no longer function as you wanted. In some cases this can lead to security vulnerabilities, since unwanted behavior may be exploitable.

davmac
  • 20,150
  • 1
  • 40
  • 68
  • What is the recommended way to handle issues such as this one? – David Černý Apr 18 '17 at 12:20
  • 4
    @DavidČerný use _sanitizers_ or tools where available. The Gcc and Clang compilers have sanitizer options that will catch your error at runtime (though they can't catch all errors). Tools such as _valgrind_ can do the same. _static analysis_ tools (such as _scan-build_ from Clang) can check your code and find some errors. – davmac Apr 18 '17 at 12:22
  • @DavidČerný *What is the recommended way to handle issues such as this one?* Be careful and don't create the issues in the first place. Sanitizers or tools can't be guaranteed to catch them all. As davmac states, "Welcome to C programming." – Andrew Henle Apr 18 '17 at 13:37
2

Suppose that you have an array as shown below.

int arr[5] = {6,7,8,9,10};

From the basics of arrays, name of the array is a pointer pointing to the base element of the array. Here, arr is the name of the array, which is a pointer, pointing to the base element, which is 6. Hence,*arr, literally, *(arr+0) gives you 6 as the output and *(arr+1) gives you 7 and so on. Here, size of the array is 5 integer elements. Now, try accessing the 10th element, though the size of the array is 5 integers. arr[10]. This is not going to give you an error, rather gives you some garbage value. As arr is just a pointer, the dereference is done as arr+0,arr+1,arr+2and so on. In the same manner, you can access arr+10 also using the base array pointer. Now, try understanding your context with this example. Though you have allocated memory only for 2 bytes for character, you can access memory beyond the two bytes allocated using the pointer. Hence, it is not throwing you an error. On the other hand, you are able to predict the output on your machine. But it is not guaranteed that you can predict the output on another machine (May be the memory you are allocating on your machine is filled with zeros and may be those particular memory locations are being used for the first time ever!). In the statement, char *new_token = realloc(curr_token, 6); note that you are reallocating the memory for 6 bytes of data pointed by curr_token pointer to the new_tokenpointer. Now, the initial size of new_token will be 6 bytes.

surendra nath
  • 339
  • 2
  • 15
  • Thanks, I was just confused because in python this is completely different ( obviously ) – David Černý Apr 18 '17 at 13:19
  • "This is not going to give you an error, rather gives you some garbage value" - it would be perfectly within specification to give an error. _Most_ compilers won't do so. It is dangerous, however, to start assuming any particular behavior on invoking what is technically undefined behavior. – davmac Apr 19 '17 at 08:55
  • Being unfamiliar with concepts sometimes is not very uncommon. Just all that we need is to polish and practice. Try executing the same and then come to a conclusion. Theoretical explanation and practical implementation holds lot of difference my friend! – surendra nath Apr 19 '17 at 09:04
  • The problem is that "practical implementation" changes over time. Compilers more and more make use of the concept of undefined behavior as a mechanism to allow optimisations. For many years you could reliably check for signed integer overflow by checking if the result was less than 0; that's no longer true. A loop such as `for (int i = 0; i >= 0; i++)` is now often turned into an infinite loop by a compiler. Relying on being able to access array 'elements' "past the end" could similarly lead to problems, in the future if not now. – davmac Apr 19 '17 at 09:11
  • As far as I worked with gcc 4.2.3 through 6.3.0, the same concept holds good. I cannot purge into the future and develop my code with all the AI that we need and code forever! My answer was just to the question, not to solve some equation in theory of relativity for black holes! – surendra nath Apr 19 '17 at 09:22
  • Black holes aside, what you _can_ do is avoid undefined behavior in your code. Then it should work - with gcc 4.2.3, 6.3.0, 7.x, 8.x and beyond! My comment was just to your answer, which addressed a particular implementation but did not make clear that _relying_ on this _particular_ implementation can be dangerous. – davmac Apr 19 '17 at 09:42
  • Wake up my friend! I've never mentioned undefined behavior. See if you are commenting on the relevant answer. Take care of your health! – surendra nath Apr 19 '17 at 09:49
  • I'm awake my pal! I know you didn't mention undefined behavior, that's the point of my comment! Have a nice day. – davmac Apr 19 '17 at 10:40
1

Usually malloc is implemented such a way that it allocates chunks of memory aligned to paragraph (fundamental alignment) that is equal to 16 bytes.

So when you request to allocate for example 2 bytes malloc actually allocates 16 bytes. This allows to use the same chunk of memory when realloc is called.

According to the C Standard (7.22.3 Memory management functions)

  1. ...The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated).

Nevertheless you should not rely on such behavior because it is not normative and as result is considered as undefined behavior.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
0

No automatic bounds checking is performed in C. The program behaviour is unpredictable. If you go writing in the memory reserved for another process, you will end with a Segmentation fault, otherwise you will only corrupt data, ecc...

Luca Di Liello
  • 1,486
  • 2
  • 17
  • 34
  • Wow, that's just barbaric, at least makes me a better programmer if I learn about this. – David Černý Apr 18 '17 at 12:18
  • Create no memory leaks in big programs is not easy. It's very important to check always that you are not exceeding arrays limits, that the memory you do not use anymore is deallocated, that you are not dereferencing a null pointer and other things like this. – Luca Di Liello Apr 18 '17 at 12:29
  • 1
    C does not _require_ automatic bounds checking. An implementation may have bounds checking. – chux - Reinstate Monica Apr 18 '17 at 13:01
  • re _No automatic bounds checking is performed in C:_... Google _[bounds checking in C](https://www.google.com/#q=Bounds+checking+in+C)_. There are implementations out there. – ryyker Apr 18 '17 at 13:51