1

I am writing a program that reads a dictionary file (text file, one word per line), and inserts it into a structure consisting of a "linked list" of arrays, where the word is recursively forwarded to the [first letter - 'a'] entry of an array (which is another array, that processes the next letter). When the whole word is "consumed", it inserts the word (unchanged) in a regular linked list of words. The program successfully processes first 15 words, but at 16-th it throws segmentation fault.

It looks like the segmentation fault occurs in add() method in the following snippet:

struct LinkedList * new = (struct LinkedList *) calloc(1,
                                       sizeof(struct LinkedList));
            if (!new) {
                perror("Not enough memory!"); // Error here
                exit(2);
            }

(Hopefully) Relevant code:

void addList (struct LinkedList * list, char * word) {
    if (!list->next)
    {
        struct LinkedList * new = malloc(sizeof(struct LinkedList));
        if (!new) {
            perror("Not enough memory!");
            exit(2);
        }

        char * new_word = malloc(strlen(word) * sizeof(char));
        if (!new_word) {
            fprintf(stderr, "Not enough memory!");
            exit(2);
        }


        new->next = 0;
        strcpy(new_word, word);
        new->word = new_word;
        list->next = new;
    }
    else
    {
        addList(list->next, word);
    }
}


void add(struct HashTree * root, char * word, char * word_sorted, int length) {

    if (length == 0)                                     // Found the correct place
    {

        if (!root->words)                                // If words are not allocated
        {
            // Create list node
            struct LinkedList * new = calloc(1, sizeof(struct LinkedList));
            if (!new) {
                perror("Not enough memory!");
                exit(2);
            }

            char * new_word = malloc(strlen(word) * sizeof(char));
            if (!new_word) {
                fprintf(stderr, "Not enough memory!");
                exit(2);
            }

            new->next = 0;

            strcpy(new_word, word);

            new->word = new_word;
            root->words = new;
        }   

        else                                            // Add to the Linked List
        {
            addList(root->words, word);
        }
    }

    else 
    {
        // printf("Length_add = %d\n", length);
        if (!root->next)                                 // If the array was not allocated yet
        {

            struct HashTree * new = malloc(27 * sizeof(struct HashTree *));
            if (!new) {
                perror("Not enough memory!");
                exit(2);
            }


            root->next = new;
        }


        add(&(root->next[ word_sorted[0] - 'a' ]), 
            word, (char *) (word_sorted +1), (length-1));  // Use the next letter.

    }


}

To save the space, Here is the link to the full code.

Here is the output of gdb core and backtrace:

    Program terminated with signal SIGSEGV, Segmentation fault. 

100 perror("Not enough memory!"); 

Full GDB output

I implemented a similar algorithm in Java before, and the algorithm seems to be correct. I am pretty new to C, and don't understand what might be wrong. I would really appreciate any help!

EDIT: Deleted sort, clean and cleanWords methods (they does not greatly affect the adding of the word to the structure). Segmentation occurred while processing the second word, line 125:

perror("Dictionary file not found!");

Link to code -
Link to sample dictionary

Valgrind output

  • Off-topic: try to loose the Java-style. Don't check, just do the things that are needed. (and can be done) And: pointers-to-pointers help a lot. In the end. (and will reduce your code size to one third, at least) – wildplasser Feb 01 '16 at 00:03
  • 1
    TL;DR. Use a debugger, cut down to a [mcve]. Use `const` where approvriate. Also do not cast the result of `malloc` & friends in C! – too honest for this site Feb 01 '16 at 00:06
  • @wildplasser Should I omit checking for null in pointers? I don't know how to make it less Java-style. If you could give an example, I might understand it better. Thanks. – Pavlik Petrovich Feb 01 '16 at 00:10
  • @Olaf I tried to cut it down in size, but every time it changes the location of the segmentation fault. – Pavlik Petrovich Feb 01 '16 at 00:15
  • 1
    Why does that hinder stripping down the code? And what does that tell you? Oh, and while not a reserved word in C, `new` should not be use nevertheless (this is more a style recommendation). – too honest for this site Feb 01 '16 at 00:19
  • This isn't the cause of your problem, but in C it has not been good practice to cast the return of malloc, realloc or calloc for many years now. – David Hoelzer Feb 01 '16 at 00:21
  • 2
    @PavlikPetrovich - for best results, your question should include the input data you're using. – enhzflep Feb 01 '16 at 00:28
  • You have **two** declarations of `i` variable in a nested `for loops` – mr5 Feb 01 '16 at 00:43
  • @PavlikPetrovich I'm also wondering how that code compiles. You need to disambiguate those two first to simplify your problems. Also, what's wrong with `strlen`? Why don't you just use it? – mr5 Feb 01 '16 at 00:48
  • @mr5 Just checked, it does not complain (about i), it just does not matter. With strlen it's my mistake - I thought it will return the size of allocated array. And yes, the code compiles without errors. – Pavlik Petrovich Feb 01 '16 at 00:56
  • Time to get [`valgrind`](http://valgrind.org/) on the job if at all possible. If your problem moves as you remove code, you almost certainly are mismanaging dynamically allocated memory, and there's a decent chance `valgrind` will tell you where. It probably won't be this function that's causing the trouble. Crashing in `perror()` when you pass a literal string to it strongly hints at serious memory corruption. – Jonathan Leffler Feb 01 '16 at 01:05
  • @Jonathan Leffler I added a link to the output of Valgrind. The problem is, I don't have any experience with it, and Valgrind shows too many errors. It is hard to understand them are my errors. – Pavlik Petrovich Feb 01 '16 at 01:33
  • Post relevant code here. – chux - Reinstate Monica Feb 01 '16 at 01:34
  • The first two entries are definitely your problem (as opposed to anyone else's). You're using unitialized data; don't. You're copying beyond the end of your allocated space; don't do that either. Go through, fixing each one in turn. Start at the top and work your way down. _[…later…]_ all the problems look like mismanaged memory — uninitialized variables, not allocating enough memory (do you account for the trailing null byte), or accessing wild pointers. Use compiler warnings: `gcc -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition -Werror`. – Jonathan Leffler Feb 01 '16 at 01:35
  • @Jonathan Leffler, Thank you. A question - am I doing something wrong checking if the pointer exists by using `if (!root->next)` ? It looks like Valgrind does not like this line. – Pavlik Petrovich Feb 01 '16 at 01:49
  • If you're sure `root` is a non-null pointer, then checking that `root->next` is non-null is often written that way, though I personally prefer `if (root->next != 0)` or `if (root->next != NULL)`. If you don't know that `root` is not null, it is dangerous to dereference it — `if (root != 0 && root->next != 0)` is safe. If root points somewhere indeterminate — you didn't initialize it, either to null or to some valid value, `valgrind` will justifiably complain. – Jonathan Leffler Feb 01 '16 at 01:51
  • I believe Java initializes variables by default. C most definitely does not. If you translate an algorithm from Java to C, you have to ensure that variables are properly initialized. – Jonathan Leffler Feb 01 '16 at 01:54

2 Answers2

2

In the main else clause of add(), you allocate some memory, but don't initialize it. You then use the uninitialized memory in the next recursive call to add(). At minimum, use calloc() in place of malloc() there. You're also allocating only enough space for 27 pointers to your structures, but you're using it as though you allocated 27 structures.

Wrong:

struct HashTree * new = malloc(27 * sizeof(struct HashTree *));

Right:

struct HashTree *new = calloc(27, sizeof(struct HashTree));

Or:

struct HashTree *new = calloc(27, sizeof(*new));

Also, in both addList() and add(), you are not allocating enough space for the string; you forgot the trailing null.

Wrong:

char *new_word = (char *) malloc(length * sizeof(char));

Right:

char *new_word = (char *) malloc(length + 1);

I don't use C++ keywords in my C code, so I'd use new_hash or thereabouts instead of just new. Many would castigate you for the casts on the allocations.

With those changes, the code ran to completion for me. It leaked like fury, but that was totally expected.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Thanks @Jonathan Leffler. It worked for me as well. as to memory leakage, I will continue to work on it. – Pavlik Petrovich Feb 01 '16 at 03:21
  • AFAICR, there are no calls to `free()` in the code you posted, so the leak is not surprising. What's the 27th entry in the trie-like structure used for? The alphabet I use has just 26 letters, so I fear you are wasting some space there. – Jonathan Leffler Feb 01 '16 at 03:25
  • I have two separate methods for this, I think you went to the link with minimal code. [Link](https://drive.google.com/file/d/0B8dXBGoXZiNiQ2ZiYVlQUDBGWFk/view?usp=sharing) is the one with actual code. Also, the file contains '\n' (Unix-style) newlines as separators, and 27-th is for " ' " character. – Pavlik Petrovich Feb 01 '16 at 03:57
  • Yes, I only looked at the MCVE-ish code. You can deal with the bigger code base to suit yourself. I don't see any special handling for `'` in the MCVE-style code, which could be a problem (probably one I circumvented by largely accidentally switching to the local dictionary, and that contains no apostrophes). – Jonathan Leffler Feb 01 '16 at 04:06
  • I deal with it by replacing `'` with `'z' + 1` so it appears to be `{` (in sort function). Probably not the best way to do it, but it appears to be Ok for my purpose. – Pavlik Petrovich Feb 01 '16 at 04:09
0

for errors like this the problem is almost always an issue with fopen. If I am right you may have forgotten to add the extension to a file or perhaps it is an issue with the file-paths you are using.

My advice:

use ctrl+f to locate all instances of fopen then verify that they reference the correct file.

EDIT pay special attention to line 184. I suspect that "/usr/share/dict/words" should be "/usr/share/dict/words.txt"

EDIT 2:

instead of (line 200):

while(fgets(word, sizeof(word), dict) != NULL) {

try fscanf:

while(!feof(dict)) {
    fscanf(dict, "%*[^']", MAX_WORD_LENGTH, word);

EDIT

Thank you @JonathanLeffler I would still recommend fscanf

int fscanf( FILE *restrict stream, const char *restrict format, ... );

If you use it right, fscanf can be very useful when reading formatted data.

fscanf(dict, "%*s", MAX_WORD_LENGTH, word);
fscanf(dict, "'", MAX_WORD_LENGTH, word);//get rid of the delimiter
chrisgotter
  • 383
  • 1
  • 3
  • 13
  • Thanks Chris. Although, the file path is correct - the program successfully reads (and processes) a few lines of this file. I am suspecting the problem is with pointers or allocation - but I have no idea where. – Pavlik Petrovich Feb 01 '16 at 02:42
  • 1
    See [`while (!feof(file))` is always wrong](http://stackoverflow.com/questions/5431941/while-feof-file-is-always-wrong) for why you should not condition your loop as shown in this answer. – Jonathan Leffler Feb 01 '16 at 03:23
  • @JonathanLeffler interesting, I have never seen that before – chrisgotter Feb 01 '16 at 03:29
  • You'd get away with it if you also (redundantly) tested the return value from `fscanf()` and broke the loop when it detected EOF. As written, you process the last line twice, which is often not a good idea. Plus you need to read the newlines somehow with `fscanf()` — there are a bag'o'worms associated with that, too. – Jonathan Leffler Feb 01 '16 at 03:31
  • @JonathanLeffler look at the dict file she put up. It is a single line delimited by `'` – chrisgotter Feb 01 '16 at 03:36
  • @chrisgotter: I didn't run into any problems with it when I downloaded it (either originally or when I checked again just now, since I'd cleaned up). I saved it as `dict` and from `wc dict`, the output was `32 33 201 dict`. I used that mostly when testing; I futzed the code to read stdin instead of `/usr/share/words/dict`. When it was working, I used the original code verbatim except for the changes above, and it worked with the 235k+ words, allocating about 250 MiB in total, in just over 1M allocations. – Jonathan Leffler Feb 01 '16 at 03:42
  • ohh, it has unix line breaks. That makes way more sense. That'll teach me to use Notepad. :p – chrisgotter Feb 01 '16 at 03:45