1

One of my classmates sent me a code and asked what was wrong with it. It was something like this:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    int *d_array, number, divisor_count, i, size = 1;
    char answer;

    d_array = (int*) malloc(size * sizeof(int));

    do
    {
        printf("\nEnter a number:  ");
        scanf("%d", &number);
        divisor_count = 0;
        for(i = 2; i < number; i++)
            if (number % i == 0) divisor_count++;
        if(divisor_count == 0)
        {
            realloc(d_array,(size + 1) * sizeof(int));
            d_array[size - 1] = number;
            size++;
        }
        printf("\nIs there another number? y/n ");
        getchar();
        answer = getchar();
    } while (answer == 'y');

    for(i = 0; i < size - 1; i++)
        printf("\n%d", d_array[i]);

    return 0;
} 

It's supposed to get numbers from the user and keep the ones that are prime and print them in the end. The output on my computer is something like:

Enter a number:  3
Is there another number? y/n y
Enter a number:  5
Is there another number? y/n y
Enter a number:  8
Is there another number? y/n y
Enter a number:  7
Is there another number? y/n y
Enter a number:  2
Is there another number? y/n n
4072680
5
7
2

There were other things in the code but the biggest problem is obviously not assigning the return value of realloc(). But the strange thing is, which is my question, why does this code shows the first prime number wrong and the others correct? The adress of the dynamic array possibly changes but why are the second one and the rest correct and not the first one?

Edit: Ok, the reason why I asked this was to try to understand the behaviour of realloc() in this code, if you have good resources please do share. When reallocating memory (when it is freeing the old one), does realloc() change the content of the old memory location?

hattenn
  • 4,371
  • 9
  • 41
  • 80
  • 2
    Invoking undefined behaviour - as your colleague is undoubtedly doing - leads to anything being possible. What you saw is perfectly valid once you invoke undefined behaviour; there really isn't much point in discussing it further. GIGO - garbage in, garbage out. You were unlucky that anything appeared to work (and yes, I mean UN-lucky!). – Jonathan Leffler Jan 05 '11 at 13:29

7 Answers7

5

Always do this:

void* new_ptr = realloc(ptr, new_size);
if(!new_ptr) error("Out of memory");
ptr = new_ptr;

The reason is that realloc() might not be able to fit the requested size within the block it already was allocated in. If it needs to move to another memory block, it will copy the data from your previously allocated memory, free the old memory block, and return the new one.

Also, if realloc() returns NULL, it means that it failed. In that case, your memory pointed to by ptr must be freed at some point, otherwise you will have a memory leak. In other words, never do this:

ptr = realloc(ptr, new_size);
Jörgen Sigvardsson
  • 4,839
  • 3
  • 28
  • 51
  • Thanks for the answer but I'm not asking for the correct code I'm just asking why it outputs that in the code written above. – hattenn Jan 05 '11 at 13:14
  • Probably because the memory management routines marked the code with some magic number to aid in debugging. Are there any differences with or without debugging enabled? With Visual C++, you get totally different behaviors of memory management routines depending on your debug mode settings. – Jörgen Sigvardsson Jan 05 '11 at 13:25
2

If you are invoking undefined behaviour then there is no way to explain the results without looking at the operating system and compiler internals.

It isn't a very satisfactory answer I know. But then, if you could describe the logic behind it it wouldn't really be called undefined behaviour!

James Greenhalgh
  • 2,401
  • 18
  • 17
  • This would actually be the heap allocation library internals, which could be part of the OS (not likely on desktop type computers) but probably is part of the library that goes with the compiler (though not necessarily). – nategoose Jan 06 '11 at 00:10
1

The problem is that after realloc the memory pointed to by d_array is considered free, so the code actually writes to the free memory. Seems that in the meanwhile the memory is allocated for something different (used by scanf?), so its beginning gets overwritten. Of course, this is completely undefined: any part of the freed memory may be overwritten at any time.

Vlad
  • 35,022
  • 6
  • 77
  • 199
  • Well, I see that it's undefined behaviour but output is always the same. I'm just wondering if realloc() changes the content of the memory when it's moving it to another place. – hattenn Jan 05 '11 at 13:20
  • @evothur: yes, it changes the content (if the new size is bigger than the old size). See: http://www.cplusplus.com/reference/clibrary/cstdlib/realloc/ By the way, this reference is perhaps the prototype of your friend's code. :-) – Vlad Jan 05 '11 at 13:23
  • @evothur: quoting from the previous reference: "If the new size is larger, the value of the newly allocated portion is indeterminate." – Vlad Jan 05 '11 at 13:25
1

Since you seem to be mostly curious about why you get repeatable (even if incorrect) output from undefined behavior, here's one scenario that could result in what you're seeing. I think understanding the underlying mechanisms behind undefined behavior can be valuable - just keep in mind that this is hypothetical, and might not really be why you're seeing what you're seeing. For undefined behavior the results you see could change from one compile to the next or one run to the next (for an odd case where just changing the name of a variable in a program that had undefined behavior changed the output of the program, see Unexpected output from Bubblesort program with MSVC vs TCC).

  1. d_array is initialized with a call to malloc(size * sizeof(int)) before the do/while loop is entered. The d_array pointer is never changed after this point (even though it might no longer point to allocated memory, as we'll see later)

  2. the first time a value is stored, realloc() is called, but the library finds that it doesn't need to change the block originally given and returns the value that was passed in to it. So 3 is stored at the start of this block. Note that this is still a bug, since your program can't know whether or not d_array is still valid because you've ignored the return value from realloc().

  3. When 5 is entered, another call to realloc() is made. This time, the library decides it does have to allocate a different block. It does so (after copying the contents of what d_array points to). Part of this reallocation results in the the block d_array is pointing to being freed, and the library's bookkeeping overwrites the 3 that was in this block with 4072680. Maybe a pointer to something the library cares about - who knows. The main thing is that the block now belongs to the library again, and it (not you) can do what it wants with it.

  4. Now 5 is written to d_array + 1 (which is not a valid operation, since the block d_array points to has been freed). So d_array[0] == 4072680 and d_array[1] == 5.

  5. from this point on, all values stored through d_array go in to the block that was freed, but for whatever reason the library never notices the heap corruption that's occurring. Just luck (bad luck if you want to find bugs). Nothing ever gets written to blocks that realloc() may have actually reallocated.

Note - like I said, all of this is one possible explanation for the behavior. The actual details might be different, and really don't matter. Once you've accessed a memory allocation that has been freed (whether to read or to write), all bets are off. The bottom line rule for undefined behavior is that anything goes.

Community
  • 1
  • 1
Michael Burr
  • 333,147
  • 50
  • 533
  • 760
1

This will show you a little bit about what is going on:

#include <stdio.h>
#include <stdlib.h>

void dbg_print_array(unsigned sz, const int * array, const char * label) {
     fprintf(stderr, "{%s:\t%p:\t", label, array);
     while (sz--) {
         fprintf(stderr, " %x ", *array++);
     }
     fprintf(stderr, "}\n");
}

int main()
{
    int *d_array, number, divisor_count, i, size = 1;
    char answer;

    d_array = (int*) malloc(size * sizeof(int));
    dbg_print_array(size, d_array, "Initial");

    do
    {
        printf("\nEnter a number:  ");
        scanf("%d", &number);
        divisor_count = 0;
        for(i = 2; i < number; i++)
            if (number % i == 0) divisor_count++;
        if(divisor_count == 0)
        {
            int * p;
            dbg_print_array(d_array, size, "pre-realloc");
            p = realloc(d_array,(size + 1) * sizeof(int));
            dbg_print_array(d_array, size+1, "post-realloc (d_array)");
            dbg_print_array(p, size+1, "post-realloc (p)");
            d_array[size - 1] = number;
            size++;
        }
        printf("\nIs there another number? y/n ");
        getchar();
        answer = getchar();
    } while (answer == 'y');

    for(i = 0; i < size - 1; i++)
        printf("\n%d", d_array[i]);

    return 0;
} 

As far as why this data gets over written so soon, it is difficult to tell. Different heap allocation implementations are likely to behave very differently for this. Since the reallocation is done in such small steps (the array grows by 1 each time) realloc will be called often.

Since even though you and I may normally think of the unallocated heap space as unused the heap allocation and freeing functions do store some data there to keep up with things. Since each call to realloc reads this data and the program provided writes to data that realloc probably assumes is owned by the heap allocation routines then it may be reading something that this program has overwritten (it writes off the end of the originally allocated space each time through the loop). After reading this corrupted data realloc may make a decision based on what it read, resulting in who knows what. At this point in the program you should consider every behavior as undefined because the basic assumptions made for normal operation are no longer valid.

edit

By examining the output of the above code you should be able to determine when realloc actually does return a different pointer than the one it was passed (my guess is to make space for the last integer read in, in your example, because malloc probably rounded up to 16 bytes for the first allocation -- also because realloc didn't abort, probably since it was never passed the invalid pointer).

The adjacent post-realloc print statements will have different addresses (first number printed for them) when realloc did not return the same pointer that it was passed.

Community
  • 1
  • 1
nategoose
  • 12,054
  • 27
  • 42
  • Interestingly enough, the code looks as if it works perfectly when it's compiled using GCC on Ubuntu. If I compiled this code in Ubuntu first, I wouldn't learn the correct usage of realloc(). As we did use it without assignment in our class. Thanks for all the help, I really appreciate it. – hattenn Jan 07 '11 at 11:27
0

try that :

d_array = realloc(d_array,(size + 1) * sizeof(int));

Did that and worked fine on my computer.

If you use gcc a gcc -Wall -o would give you the warning you were looking for.

you realloc() but you don't use the new memory allocated. Simply speaking. It still points to the old one (depending if realloc() used the same block or moved it to a different place). If you were "lucky" and the same block was used you just continue writing, otherwise you are writing to the old place and end up writing in a place you shouldn't (because realloc() free()s the old block internally. realloc() doesn't change your pointer variable, just reallocates space. you need to change the memory address with the result of realloc() returned. And as the comments said. you need to check for successful realloc()ation. the new space allocated contains the data of the old buffer (as long as the new memory allocated is bigger than the old one). the contents of the old memory are not changed but the old place is freed.

  • 1
    And when realloc returns NULL, you're terminally screwed when using such. – user562374 Jan 05 '11 at 13:09
  • That is included in my question, I know that this is the correct way but I'm not trying to correct the code. The question is, why is that wrong code up there gives that output, what's the logic behind it? – hattenn Jan 05 '11 at 13:10
  • Well I checked it and generally realloc() returns a pointe to another place of the memory but the output is always the same, first one not correct and the rest are. – hattenn Jan 05 '11 at 13:16
0

I think you have to use realloc like this:

d_array = realloc(d_array,(size + 1) * sizeof(int));

and not just:

realloc(d_array,(size + 1) * sizeof(int));

Another problem I see is that size=1 initially, so the first time the code runs it is doing this:

realloc(d_array,(size + 1) * sizeof(int));

size + 1 = 2 (allocating memory for 2 integers, but you only need one.) the solution could be starting size in 0.

lezo
  • 175
  • 1
  • 1
  • 10
  • Thanks for the answer but I'm not trying to correct the code, I'm just trying to understand the behaviour of realloc() – hattenn Jan 05 '11 at 13:41