0

This is really strange... and I can't debug it (tried for about two hours, debugger starts going haywire after a while...). Anyway, I'm trying to do something really simple:

Free an array of strings. The array is in the form:

char **myStrings. The array elements are initialized as:

myString[index] = malloc(strlen(word));
myString[index] = word;

and I'm calling a function like this:

free_memory(myStrings, size); where size is the length of the array (I know this is not the problem, I tested it extensively and everything except this function is working).

free_memory looks like this:

void free_memory(char **list, int size) {

    for (int i = 0; i < size; i ++) {
        free(list[i]);
    }

    free(list);
}

Now here comes the weird part. if (size> strlen(list[i])) then the program crashes. For example, imagine that I have a list of strings that looks something like this:

myStrings[0] = "Some";
myStrings[1] = "random";
myStrings[2] = "strings";

And thus the length of this array is 3.

If I pass this to my free_memory function, strlen(myStrings[0]) > 3 (4 > 3), and the program crashes.

However, if I change myStrings[0] to be "So" instead, then strlen(myStrings[0]) < 3 (2 < 3) and the program does not crash.

So it seems to me that free(list[i]) is actually going through the char[] that is at that location and trying to free each character, which I imagine is undefined behavior.

The only reason I say this is because I can play around with the size of the first element of myStrings and make the program crash whenever I feel like it, so I'm assuming that this is the problem area.

Note: I did try to debug this by stepping through the function that calls free_memory, noting any weird values and such, but the moment I step into the free_memory function, the debugger crashes, so I'm not really sure what is going on. Nothing is out of the ordinary until I enter the function, then the world explodes.

Another note: I also posted the shortened version of the source for this program (not too long; Pastebin) here. I am compiling on MinGW with the c99 flag on.

PS - I just thought of this. I am indeed passing numUniqueWords to the free function, and I know that this does not actually free the entire piece of memory that I allocated. I've called it both ways, that's not the issue. And I left it how I did because that is the way that I will be calling it after I get it to work in the first place, I need to revise some of my logic in that function.

Source, as per request (on-site):

#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <stdlib.h>
#include "words.h"

int getNumUniqueWords(char text[], int size);

int main(int argc, char* argv[]) {

        setvbuf(stdout, NULL, 4, _IONBF); // For Eclipse... stupid bug. --> does NOT affect the program, just the output to console!

        int nbr_words;

        char text[] = "Some - \"text, a stdin\". We'll have! also repeat? We'll also have a repeat!";
        int length = sizeof(text);
        nbr_words = getNumUniqueWords(text, length);

        return 0;
}

void free_memory(char **list, int size) {

        for (int i = 0; i < size; i ++) {
                // You can see that printing the values is fine, as long as free is not called.
                // When free is called, the program will crash if (size > strlen(list[i]))
                //printf("Wanna free value %d w/len of %d: %s\n", i, strlen(list[i]), list[i]);
                free(list[i]);
        }
        free(list);
}

int getNumUniqueWords(char text[], int length) {
        int numTotalWords = 0;
        char *word;

        printf("Length: %d characters\n", length);

        char totalWords[length];
        strcpy(totalWords, text);

        word = strtok(totalWords, " ,.-!?()\"0123456789");

        while (word != NULL) {
                numTotalWords ++;
                printf("%s\n", word);
                word = strtok(NULL, " ,.-!?()\"0123456789");
        }

        printf("Looks like we counted %d total words\n\n", numTotalWords);

        char *uniqueWords[numTotalWords];
        char *tempWord;
        int wordAlreadyExists = 0;
        int numUniqueWords = 0;

        char totalWordsCopy[length];
        strcpy(totalWordsCopy, text);

        for (int i = 0; i < numTotalWords; i++) {
                uniqueWords[i] = NULL;
        }

        // Tokenize until all the text is consumed.
        word = strtok(totalWordsCopy, " ,.-!?()\"0123456789");
        while (word != NULL) {

                // Look through the word list for the current token.
                for (int j = 0; j < numTotalWords; j ++) {
                        // Just for clarity, no real meaning.
                        tempWord = uniqueWords[j];

                        // The word list is either empty or the current token is not in the list.
                        if (tempWord == NULL) {
                                break;
                        }

                        //printf("Comparing (%s) with (%s)\n", tempWord, word);

                        // If the current token is the same as the current element in the word list, mark and break
                        if (strcmp(tempWord, word) == 0) {
                                printf("\nDuplicate: (%s)\n\n", word);
                                wordAlreadyExists = 1;
                                break;
                        }
                }

                // Word does not exist, add it to the array.
                if (!wordAlreadyExists) {
                        uniqueWords[numUniqueWords] = malloc(strlen(word));
                        uniqueWords[numUniqueWords] = word;
                        numUniqueWords ++;
                        printf("Unique: %s\n", word);
                }

                // Reset flags and continue.
                wordAlreadyExists = 0;
                word = strtok(NULL, " ,.-!?()\"0123456789");
        }

        // Print out the array just for funsies - make sure it's working properly.
        for (int x = 0; x <numUniqueWords; x++) {
                printf("Unique list %d: %s\n", x, uniqueWords[x]);
        }

        printf("\nNumber of unique words: %d\n\n", numUniqueWords);

        // Right below is where things start to suck.
        free_memory(uniqueWords, numUniqueWords);

        return numUniqueWords;
}
Chris Cirefice
  • 5,475
  • 7
  • 45
  • 75
  • See the source, it was malloc'd. Unless I'm mistaken somehow and it really *wasn't* malloc'd like I think it was, and I was just calling malloc for no particular reason. I'm actually not sure now... that might be the issue. Could you take a look at the source? Line 97/98 would be the ones in question. – Chris Cirefice Oct 21 '13 at 04:54
  • We shouldn't need to see the source by going off-site; it should be in the question. – Jonathan Leffler Oct 21 '13 at 04:56
  • I suppose I'll make an edit then, I just figured it would be too long to put in there directly, and unsightly to boot. – Chris Cirefice Oct 21 '13 at 04:56
  • @ChrisCirefice you don;t need to call free for each element of the array. Just call free arrayName – Umer Farooq Oct 21 '13 at 05:00
  • On Mac, the `free()` in the library reports `ff(35384) malloc: *** error for object 0x7fff57da2330: pointer being freed was not allocated`. – Jonathan Leffler Oct 21 '13 at 05:04
  • Everything I've read about 'matrices' in C being created dynamically (rows have varying length), and matrices in general I suppose, always free each row, then the initial pointer. Imagine freeing the pointer to the first element that points to `N` rows, where are all those rows? See http://stackoverflow.com/a/12462627/1986871 – Chris Cirefice Oct 21 '13 at 05:04
  • [valgrind](http://valgrind.org/) may help you. – Dayal rai Oct 21 '13 at 05:12
  • Hmmm good point, I'll have to try that next time. I completely forgot about it; I've heard of it, never used it. Hopefully I won't be running into any more problems with memory management though... – Chris Cirefice Oct 21 '13 at 05:15

5 Answers5

10

You've got an answer to this question, so let me instead answer a different question:

I had multiple easy-to-make mistakes -- allocating a wrong-sized buffer and freeing non-malloc'd memory. I debugged it for hours and got nowhere. How could I have spent that time more effectively?

You could have spent those hours writing your own memory allocators that would find the bug automatically.

When I was writing a lot of C and C++ code I made helper methods for my program that turned all mallocs and frees into calls that did more than just allocate memory. (Note that methods like strdup are malloc in disguise.) If the user asked for, say, 32 bytes, then my helper method would add 24 to that and actually allocate 56 bytes. (This was on a system with 4-byte integers and pointers.) I kept a static counter and a static head and tail of a doubly-linked list. I would then fill in the memory I allocated as follows:

  • Bytes 0-3: the counter
  • Bytes 4-7: the prev pointer of a doubly-linked list
  • Bytes 8-11: the next pointer of a doubly-linked list
  • Bytes 12-15: The size that was actually passed in to the allocator
  • Bytes 16-19: 01 23 45 67
  • Bytes 20-51: 33 33 33 33 33 33 ...
  • Bytes 52-55: 89 AB CD EF

And return a pointer to byte 20.

The free code would take the pointer passed in and subtract four, and verify that bytes 16-19 were still 01 23 45 67. If they were not then either you are freeing a block you did not allocate with this allocator, or you've written before the pointer somehow. Either way, it would assert.

If that check succeeded then it would go back four more and read the size. Now we know where the end of the block is and we can verify that bytes 52 through 55 are still 89 AB CD EF. If they are not then you are writing over the end of a block somewhere. Again, assert.

Now that we know that the block is not corrupt we remove it from the linked list, set ALL the memory of the block to CC CC CC CC ... and free the block. We use CC because that is the "break into the debugger" instruction on x86. If somehow we end up with the instruction pointer pointing into such a block it is nice if it breaks!

If there is a problem then you also know which allocation it was, because you have the allocation count in the block.

Now we have a system that finds your bugs for you. In the release version of your product, simply turn it off so that your allocator just calls malloc normally.

Moreover you can use this system to find other bugs. If for example you believe that you've got a memory leak somewhere all you have to do is look at the linked list; you have a complete list of all the outstanding allocations and can figure out which ones are being kept around unnecessarily. If you think you're allocating too much memory for a given block then you can have your free code check to see if there are a lot of 33 in the block that is about to be freed; that's a sign that you're allocating your blocks too big. And so on.

And finally: this is just a starting point. When I was using this debug allocator professionally I extended it so that it was threadsafe, so that it could tell me what kind of allocator was doing the allocation (malloc, strdup, new, IMalloc, etc.), whether there was a mismatch between the alloc and free functions, what source file contained the allocation, what the call stack was at the time of the allocation, what the average, minimum and maximum block sizes were, what subsystems were responsible for what memory usage...

C requires that you manage your own memory; this definitely has its pros and cons. My opinion is that the cons outweigh the pros; I much prefer to work in automatic storage languages. But the nice thing about having to manage your own storage is that you are free to build a storage management system that meets your needs, and that includes your debugging needs. If you must use a language that requires you to manage storage, use that power to your advantage and build a really powerful subsystem that you can use to solve professional-grade problems.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • Good point - I guess the only problem is that I'm new to C and, looking back at the debugger output now (with the broken code), there really isn't any indication that anything is going *wrong*, in fact until I (just now) looked at the addresses of the old malloc and assignment, there is no indication that anything is malfunctioning and I was not, as a student being taught basic concepts like array traversal and simple data structures, looking at the hex values of memory addresses and comparing them for inaccuracies. – Chris Cirefice Oct 21 '13 at 22:33
  • I understand the concept of your memory allocator, however I am nowhere near competent in any language to write anything close to what your "starting point" does. – Chris Cirefice Oct 21 '13 at 22:35
  • 2
    Welcome to c programming. There is almost never any indication that something is wrong! The language and libraries are chock full of pitfalls. – Eric Lippert Oct 21 '13 at 23:55
  • It is unfortunate - this is why (I presume) that educators nowadays prefer to intro with languages like Java, where the dangerous behind-the-scenes pitfalls are managed for you. However, it is absolutely necessary (imo) to learn how to manage memory, because at the very least it grants one a better understanding of how the hardware actually manages its resources. – Chris Cirefice Oct 22 '13 at 01:58
  • @EricLippert: Fantastic ideas, thank you! ... I wonder if there is an off-the-shelf debug allocator like that. Well, there is valgrind :) – Alex I Oct 26 '13 at 08:07
4

The problem is not how you're freeing, but how you're creating the array. Consider this:

uniqueWords[numUniqueWords] = malloc(strlen(word));
uniqueWords[numUniqueWords] = word;

...

word = strtok(NULL, " ,.-!?()\"0123456789");

There are several issues here:

word = strtok(): what strtok returns is not something that you can free, because it has not been malloc'ed. ie it is not a copy, it just points to somewhere inside the underlying large string (the thing you called strtok with first).

uniqueWords[numUniqueWords] = word: this is not a copy; it just assigns the pointer. the pointer which is there before (which you malloc'ed) is overwritten.

malloc(strlen(word)): this allocates too little memory, should be strlen(word)+1

How to fix:

Option A: copy properly

// no malloc
uniqueWords[numUniqueWords] = strdup(word); // what strdup returns can be free'd

Option B: copy properly, slightly more verbose

uniqueWords[numUniqueWords] = malloc(strlen(word)+1);
strcpy(uniqueWords[numUniqueWords], word); // use the malloc'ed memory to copy to

Option C: don't copy, don't free

// no malloc
uniqueWords[numUniqueWords] = word; // not a copy, this still points to the big string
// don't free this, ie don't free(list[i]) in free_memory

EDIT As other have pointed out, this is also problematic:

    char *uniqueWords[numTotalWords];

I believe this is a GNU99 extension (not even C99), and indeed you cannot (should not) free it. Try char **uniqueWords = (char**)malloc(sizeof(char*) * numTotalWords). Again the problem is not the free() but the way you allocate. You are on the right track with the free, just need to match every free with a malloc, or with something that says it is equivalent to a malloc (like strdup).

Alex I
  • 19,689
  • 9
  • 86
  • 158
  • I really did have a feeling that the assignment from `uniqueWords[numUniqueWords] = word` was just that, an assignment. And I also had a feeling that I should have just copied the string from the start. So it turns out that my initial concern with how to put those stupid things in the array in the first place was the problem. I can say this though, your explanation of the methods of manipulating those values was great. I just hope that someone else comes across this question in the future... – Chris Cirefice Oct 21 '13 at 05:10
  • 2
    The `char *uniqueWords[numTotalWords];` declaration is a C99 standard VLA or variable-length array. They're also in C11, but are optional there (ISO/IEC 9899:2011 §6.10.8.3 Conditional feature macros). They are not just a GNU compiler extension. – Jonathan Leffler Oct 21 '13 at 05:44
4

You are using this code in an attempt to allocate the memory:

uniqueWords[numUniqueWords] = malloc(strlen(word));
uniqueWords[numUniqueWords] = word;
numUniqueWords++;

This is wrong on many levels.

  1. You need to allocate strlen(word)+1 bytes of memory.
  2. You need to strcpy() the string over the allocated memory; at the moment, you simply throw the allocated memory away.

Your array uniqueWords is itself not allocated, and the word values you have stored are from the original string which has been mutilated by strtok().

As it stands, you cannot free any memory because you've already lost the pointers to the memory that was allocated and the memory you are trying to free was never in fact allocated by malloc() et al.

And you should be error checking the memory allocations too. Consider using strdup() to duplicate strings.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Haha, I laughed at your second to last paragraph because of how ridiculously true it is, and how I'm just now seeing it as a problem (thanks to you and @Alex I). Not to mention how I'm creating the array in the first place. Good thing I'm still in school... – Chris Cirefice Oct 21 '13 at 05:12
  • We all go through the learning process… It is just as well you're using `sizeof()` instead of `strlen()` in `main()` as otherwise you'd be allocating too little memory for the copy of the long string in the counting routine. With the allocation code fixed (I used `strdup()`) and removing the `free(list)` from the routine, the code runs to completion. I didn't use `valgrind` on it, but it would probably be OK. – Jonathan Leffler Oct 21 '13 at 05:15
  • Well I am at least aware that `strlen` does not include the null terminator character, and that `sizeof` does. Ironically, I used `strlen` in the `malloc`. Go figure. – Chris Cirefice Oct 21 '13 at 05:19
  • If you'd used `sizeof()` in the call to `malloc()`, you would have gotten either 4 or 8 (32-bit or 64-bit), not the length of the string, because `char *word` is just a pointer, not an array. – Jonathan Leffler Oct 21 '13 at 05:23
  • Oh yeeaaahhhh... I remember that now. When I started this project I thought I could use `sizeof`, but then realized that I would just be getting the size of the pointer, thus why I resorted to `strlen`. But then I forgot to account for the missing null terminator character, so that's where the `+1` comes from. Got it now! Thanks :) – Chris Cirefice Oct 21 '13 at 05:26
0

You are trying to free char *uniqueWords[numTotalWords];, which is not allowed in C.

Since uniqueWords is allocated on the stack and you can't call free on stack memory.

Just remove the last free call, like this:

void free_memory(char **list, int size) {

    for (int i = 0; i < size; i ++) {
        free(list[i]);
    }
}
Raphael Ahrens
  • 953
  • 15
  • 29
  • Aren't arrays in C just pointers to a block in memory? So `myArray[i][i]` is the same as `**myArray`, de-referencing twice. Thus the declaration would follow the same logic? `char *myArray[X]` where X is known is the same as `char **myArray`. At least that's how I've been manipulating it this whole time, both ways, and it's working, both ways. So passing `myArray`, as it is a `char **`, should function in the same fashion I imagine. – Chris Cirefice Oct 21 '13 at 05:02
  • @ChrisCirefice No there is heap memory and stack memory. Stack memory is manage by he compiler and heap memory is managed by the programmer with `free` and `malloc`/`calloc` – Raphael Ahrens Oct 21 '13 at 05:04
0

Proper way of allocating and deallocating char array.

char **foo = (char **) malloc(row* sizeof(char *));

*foo = malloc(row * col * sizeof(char));

for (int i = 1; i < row; i++) {
  foo[i] = *foo + i*col;
}
free(*foo);
free(foo);

Note that you don't need to go through each & every element of the array for deallocation of memory. Arrays are contiguous so call free on the name of the array.

Umer Farooq
  • 7,356
  • 7
  • 42
  • 67