2

I'm making a program that reads a given dictionary into a trie tree, and then performs auto complete on a string inputted by the user. When I use the dictionary file that I am required to use (~100,000 words) I get a segmentation fault. I can't seem to figure out what is causing the segmentation fault. Any help would be appreciated.

typedef struct trieTree {
    int data;
    struct trieTree *array[26];
}trieTree;

insert function:

    trieTree* insert_tree(trieTree *t, char *s, int val)
{
    int i;
    trieTree *p;
    if (strlen(s) == 0)
    return t;
    if (t == NULL)
    t = new_tree(t);
    p = t;
    for (i = 0; i < strlen(s); ++i) {
        if (p->array[s[i] - 'a'] == NULL) 
        p->array[s[i] - 'a'] = malloc(sizeof (trieTree));
        p = p->array[s[i] - 'a'];
    }
    p->data = val;
    return t;
}

Filling the tree:

trieTree* load_tree(trieTree *t, char *file)
{
    char s[MAX];
    FILE *f = fopen(file, "r");
    if (f == NULL)
    printf("Error! File not found.");
    else 
    while (feof(f) == 0) {
        fscanf(f, "%s", s);
        t = insert_tree(t, s, 1);
    }
    return t;
}

Main function

int main()
{
    trieTree t;
    new_tree(&t);
    load_tree(&t, "dict.txt");
    char word[100];
    printf("Enter word: ");
    scanf("%s", word);
    char dat[100] = "";
    search_tree(&t, word, dat);
    return 0;
}


trieTree* new_tree(trieTree *t)
{
    int i;
    t = malloc(sizeof (trieTree));
    for (i = 0; i < 24; ++i)
    t->array[i] = 0;
    return t;
}
user2871898
  • 93
  • 2
  • 10
  • one of the reason may be that you have allocated too much size. try with less size. – roottraveller Nov 30 '15 at 03:31
  • Does the segfault occur before or after you enter a word? What if you try it with a truncated dictionary of only 50,000 entries? – Beta Nov 30 '15 at 03:33
  • can you show the code for the `new_tree()` function? – Haris Nov 30 '15 at 03:34
  • Suggest you fire up your favourite debugger and use that to help you find the problem. – kaylum Nov 30 '15 at 03:34
  • Shouldn't you be calling `new_tree` again instead of `malloc` in the for loop in `insert_tree`? Otherwise the struct isn't initialized properly. – Kevin Nov 30 '15 at 03:46
  • I just added the new_tree function. – user2871898 Nov 30 '15 at 04:00
  • 1
    Do none of your words use 'y' or 'z'? If not, why are you only initializing 24 of the 26 pointers in the array in the trie structure? What happens if you need to use one of the uninitialized pointers? How much data does it take before you crash? Your full dictionary has 100k words in it, but does it take all 100k before the crash happens, or just 1k or what? Is there any danger that there's an upper-case letter in the dictionary? Your code assumes that what's read will contain only lower-case letters. – Jonathan Leffler Nov 30 '15 at 04:07
  • 1
    Also, you need to read [Why is `while (!feof(file))` always wrong?](https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) — it applies to your code, though your program may be crashing before it hits that bug. After successfully opening a file, you should close it, too. – Jonathan Leffler Nov 30 '15 at 04:12

1 Answers1

2

Your function new_tree() returns a pointer to allocated memory but the returned value is ignored. That's a memory leak, and your code continues to use an uninitialized variable. That's a problem!

int main()
{
    trieTree t;
    new_tree(&t);
    load_tree(&t, "dict.txt");
    …

trieTree* new_tree(trieTree *t)
{
    int i;
    t = malloc(sizeof(trieTree));
    for (i = 0; i < 24; ++i)
        t->array[i] = 0;
    return t;
}

The 24 in the function should be 26, of course. But the function allocates memory and assigns it to the local pointer (original set to point to t in main(), but the malloc() zaps that value). That pointer is returned, but the return is ignored. The variable t in main() is still uninitialized, but it is passed to the load_tree() function.

Frankly, you need:

int main()
{
    trieTree *tp = new_tree();
    load_tree(&t, "dict.txt");
    …

trieTree* new_tree(void)
{
    int i;
    trieTree *t = malloc(sizeof(trieTree));
    if (t == 0)
    {
        fprintf(stderr, "memory allocation failure\n");
        exit(EXIT_FAILURE);
    }
    for (i = 0; i < 26; ++i)
        t->array[i] = 0;
    return t;
}

Note that errors should be reported on the standard error channel; that is what it's for. And that every memory allocation should be checked, because if you don't check, it will fail and your program will crash.

There are probably a lot of other problems; I've not investigated them all. This should get you further before crashing.

This seems to work for me, though admittedly I only tested it on a 'dictionary' of 257 words.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

enum { MAX = 1024 };

typedef struct trieTree
{
    int data;
    struct trieTree *array[26];
} trieTree;

static trieTree *new_tree(void)
{
    int i;
    trieTree *t = malloc(sizeof(trieTree));
    if (t == 0)
    {
        fprintf(stderr, "malloc for %zu bytes failed\n", sizeof(trieTree));
        exit(EXIT_FAILURE);
    }
    t->data = 0;
    for (i = 0; i < 26; ++i)
        t->array[i] = 0;
    return t;
}

static trieTree *insert_tree(trieTree *t, char *s, int val)
{
    int i;
    trieTree *p;
    if (strlen(s) == 0)
        return t;
    if (t == NULL)
        t = new_tree();
    p = t;
    int len = strlen(s);
    for (i = 0; i < len; ++i)
    {
        if (p->array[s[i] - 'a'] == NULL)
            p->array[s[i] - 'a'] = new_tree();
        p = p->array[s[i] - 'a'];
    }
    p->data = val;
    return t;
}

static trieTree *load_tree(trieTree *t, char *file)
{
    char s[MAX];
    FILE *f = fopen(file, "r");
    if (f == NULL)
    {
        fprintf(stderr, "Error! File not found.");
        exit(EXIT_FAILURE);
    }
    else
    {
        while (fscanf(f, "%s", s) == 1)
            t = insert_tree(t, s, 1);
        fclose(f);
    }
    return t;
}

static void print_trie(trieTree *t, char *pad)
{
    int len = strlen(pad);
    char space[len + 3];
    memset(space, ' ', len + 2);
    space[len + 2] = '\0';

    for (int i = 0; i < 26; i++)
    {
        if (t->array[i] != 0)
        {
            printf("%s%c\n", pad, i + 'a');
            print_trie(t->array[i], space);
        }
    }
}

static void free_trie(trieTree *t)
{
    if (t != 0)
    {
        for (int i = 0; i < 26; i++)
            free_trie(t->array[i]);
        free(t);
    }
}

int main(void)
{
    trieTree *tp = new_tree();
    if (tp != 0)
    {
        tp = load_tree(tp, "dict.txt");
        print_trie(tp, "");
        free_trie(tp);
    }
    return 0;
}

I believe it is leak free, too.

Note that this code will crash and burn if any of the input words contains any upper-case letters, or digits, or punctuation. It only handles lower-case and white space; anything else is an unchecked disaster waiting to devastate your program. That's because I've not done any substantive work in the insert_tree() function. You need to worry about 'invalid' characters in that function, probably by case-converting upper-case letters to lower-case and ignoring anything that's not a letter.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • This seems to help a bit. The program runs a bit longer, but I still end up with the segmentation fault. I used gdb (my first time using it) and it mentions my insert_tree function (0x00000000004018a9 in insert_tree () ) so I'm assuming the problem is there. – user2871898 Nov 30 '15 at 04:56
  • Although your code makes sense, I still seem to be getting a segmentation fault with it. I'm confused as to why I'm getting this error still when it works fine for you. I tested it with a dictionary file with 30 words, and the 100,000 word one. – user2871898 Nov 30 '15 at 05:37
  • Did you use my code verbatim? Or did you try to modify your code to match? I suggest first making a verbatim copy of mine. Does it work on the 30 word file? Does your best effort work on the 30 word file? If both work, you're in with a chance. If mine fails, you'll need to send me the 30 word file (see my profile), but only if you're sure it only contains lower-case alphabetic characters and white space — anything else will cause a crash. If yours fails, you'll need to work out what you're doing differently and why that is causing a problem. Make sure you're compiling with every warning. – Jonathan Leffler Nov 30 '15 at 05:54
  • If you use GCC, you should be able to compile like I did/do with `gcc -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition -Werror` — I rarely both to run code that isn't compiling cleanly under those options. – Jonathan Leffler Nov 30 '15 at 05:58
  • Oh shoot, my apologies, I just noticed that I added a word with an upper-case character in my dictionary file right before I modified my code with the last part you posted. How would I go about making it compatible with upper-case characters or even an apostrophe? Thank you for all your help. – user2871898 Nov 30 '15 at 06:05
  • The simplest fix is probably to modify the insert code so it case-converts the input characters in the string before processing each one, and so it skips any non-alpha characters altogether. Your search code will need to do the same, of course. Alternatively, you can use 52 elements in your trie array to handle lower and upper cases separately — 53 if you want to track apostrophes too. Or you could add a 27th element to the array if you want to track apostrophes but only deal with letters case-insensitively. Etc. – Jonathan Leffler Nov 30 '15 at 06:10
  • So you're saying I'd do " struct trieTree *array[53];" to deal with both cases as well as apostrophes? Would I have to modify anything else if I do it this way? – user2871898 Nov 30 '15 at 06:29
  • You'd have to map 'A'..'Z' to 26..51 and apostrophe to 52, wouldn't you? And make sure you initialized all the pointers in the array. Otherwise, I don't think you'd need to do much, no. – Jonathan Leffler Nov 30 '15 at 06:31
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/96591/discussion-between-user2871898-and-jonathan-leffler). – user2871898 Nov 30 '15 at 19:02