0

So this question is a bit different, the code works I just don't know why or how. I'm taking the CS50 course and I'm solving this practice problem aiming at training us at searching tries databases. The aim is to create a program that would take dog names from a txt file and sort it in a trie struct and give the user the ability to input a name and would give a result of found or not found for the given name in the database.

I only wrote the check function and it's working as it should but I don't understand how and feel there's something wrong. It starts off the right way and my trav pointer is set to the root node and travels through the given word letter by letter and changes value to the next letter, when it reaches the end the next letter will obviously be the \0 at the end of the string given by user and this makes it that trav->children[letter] is trav->children[-97] which is something outside the scope of the array of the trav struct. Yet, somehow it works and will get the right value. I feel like it should give me a seg fault for going beyond array's bounds.

I don't know what to try at this point because the code is working, I just don't understand why

please note I'm not asking about undefined behaviour My function has to find a (is_word) value to return the correct value every time, how does it find it if I'm going beyond the array limits i.e: undefined behaviour ?

here is the full code

// Saves popular dog names in a trie
// https://www.dailypaws.com/dogs-puppies/dog-names/common-dog-names

#include <cs50.h>
#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define SIZE_OF_ALPHABET 26
#define MAXCHAR 20

typedef struct node
{
    bool is_word;
    struct node *children[SIZE_OF_ALPHABET];
}
node;

// Function prototypes
bool check(char *word);
bool unload(void);
void unloader(node *current);

// Root of trie
node *root;

// Buffer to read dog names into
char name[MAXCHAR];

int main(int argc, char *argv[])
{
    // Check for command line args
    if (argc != 2)
    {
        printf("Usage: ./trie infile\n");
        return 1;
    }

    // File with names
    FILE *infile = fopen(argv[1], "r");
    if (!infile)
    {
        printf("Error opening file!\n");
        return 1;
    }

    // Allocate root of trie
    root = malloc(sizeof(node));

    if (root == NULL)
    {
        return 1;
    }

    root->is_word = false;
    for (int i = 0; i < SIZE_OF_ALPHABET; i++)
    {
        root->children[i] = NULL;
    }

    // Add words to the trie
    while (fscanf(infile, "%s", name) == 1)
    {
        node *cursor = root;

        for (int i = 0, n = strlen(name); i < n; i++)
        {
            int index = tolower(name[i]) - 'a';
            if (cursor->children[index] == NULL)
            {

                // Make node
                node *new = malloc(sizeof(node));
                new->is_word = false;
                for (int j = 0; j < SIZE_OF_ALPHABET; j++)
                {
                    new->children[j] = NULL;
                }
                cursor->children[index] = new;
            }

            // Go to node which we may have just been made
            cursor = cursor->children[index];
        }

        // if we are at the end of the word, mark it as being a word
        cursor->is_word = true;
    }

    if (check(get_string("Check word: ")))
    {
        printf("Found!\n");
    }
    else
    {
        printf("Not Found.\n");
    }

    if (!unload())
    {
        printf("Problem freeing memory!\n");
        return 1;
    }

    fclose(infile);
}

// TODO: Complete the check function, return true if found, false if not found
bool check(char* word)
{
    // setting up a trav pointer to iterate through the trie
    node *trav = root;
    // iterating through the word given and descending through the pointers given by its letters
    for (int i = 0, n = strlen(word); i <= n; i++)
    {
        int letter = tolower(word[i]) - 'a';
        // if the next pointer is not null this means there are still letters in the word 
        if (trav->children[letter] != NULL)
        {
            trav = trav->children[letter];
        }
        else
        {
            if (trav->is_word)
            {
                return true;
            }
        }
    }
    return false;
}

// Unload trie from memory
bool unload(void)
{

    // The recursive function handles all of the freeing
    unloader(root);

    return true;
}

void unloader(node* current)
{

    // Iterate over all the children to see if they point to anything and go
    // there if they do point
    for (int i = 0; i < SIZE_OF_ALPHABET; i++)
    {
        if (current->children[i] != NULL)
        {
            unloader(current->children[i]);
        }
    }

    // After we check all the children point to null we can get rid of the node
    // and return to the previous iteration of this function.
    free(current);
}
user75470
  • 45
  • 5
  • 1
    Does this answer your question? [Does "Undefined Behavior" really permit \*anything\* to happen?](https://stackoverflow.com/questions/32132574/does-undefined-behavior-really-permit-anything-to-happen) – Raymond Chen Apr 23 '23 at 15:15
  • @raymond chen, not really, I'm asking about this case specific for a reason, the function I wrote has to search for a value ( is_word) and find it in order to return a " found " result. This value is only going to be stored in a node of the trie database. Any node has two components a (is_word) value and an array of 26 characters. Now, when I actually go beyond the 26 characters nodes how does the program find the needed (is_word) value there ?? – user75470 Apr 23 '23 at 19:18
  • In this case, the program finds some random memory which resembles a node enough that accessing `is_word` doesn't crash. Once you wander into undefined behavior, anything is possible. If you want an error for using an invalid index, then use `std::array` with the `at()` method. – Raymond Chen Apr 23 '23 at 19:40
  • Sorry I'm a beginner and didn't quite catch what you said about std: :array and at(). My confusion comes from the fact that the program does work as intended every time which means there's probably something that I don't understand about how the program actually runs. I did try using printf and debug50 at first and managed to get it to work but these two tools are unable to tell me why the is_word value comes back as true when the trav->children[-97] is clearly not a valid node and also this happens only with words that are in the database and when you enter a wrong word you get not found !! – user75470 Apr 23 '23 at 19:51
  • @RaymondChen, btw I'm grateful that you are taking the time to discuss and explain to me what seems like a weird and not very obvious question – user75470 Apr 23 '23 at 19:53
  • "if the next pointer is not null this means there are still letters in the word" is incorrect, `is_word` is true than there is a word in the index, even if it's a prefix of another word, ("a", "ab"). – Neil Apr 24 '23 at 00:24

1 Answers1

0

If trav->children[letter] is not a valid pointer, then it moves to the next letter and checks if it is a word without updating trav. This is safe for isalpha characters, but not correct. (eg fido: fidoo yields true.) A trie should be always be moving trav down the tree to stay in line with the position of the letter of word.

This the last \0 (and anything that is not isalpha) produces undefined behaviour. A possible explanation for the not-crashing is that, in practice, almost all null pointers are all-bits-zero, thus an uncompressed trie has a lot of memory that is zero. The chance of one hitting a non-zero in a region of memory that contains a trie is small, (but unpredictable.)

To fix this, you should transform the letter in check in the same way that you transformed it on insertion (edit: And i < n.) That is, check if it's a letter first (isalpha), tolower case, subtract 'a'. Then check if it is a node. If it is not, the word is not in the set and one can return (short-circuit) false. Only at the end do you check is_word.

Neil
  • 1,767
  • 2
  • 16
  • 22
  • when my function was i < strlen, it didn't work unless i added a space after the name and that's why you can see that i'm using i <= strlen so that it can detect correct names without the space. If it taps into a random zero shouldn't that return a false. false = 0 and true = 1 ?? I still don't get the explanation of why it gives right answer with right words and not found with wring words. it's definitely not a random answer every time – user75470 Apr 23 '23 at 22:23
  • Good catch. I will update. – Neil Apr 23 '23 at 22:36
  • that's a good catch for a malfunction. I tried your idea of the out of loop check, still saying found for any names that has a prefix found in the database so fidoo would still work which is baffling to me – user75470 Apr 23 '23 at 22:49
  • I succeeded using your code with few modifications; this is a bug `i <= n`, checked `isalpha`, and the logic of it is `return false` when `trav->children[letter] == NULL`, you should never `return true`; it is progressively shrinking the allowable set from the trie until you `return trav->is_word`. You are almost there. – Neil Apr 24 '23 at 01:15