CS50 speller.c segmentation fault with hash table in load function

Question

I'm taking CS50 and currently on problem set 'speller.c'. Sorry if this is a simple fix, but i'm just barely understanding C.

I've written all of the functions, but when I try to run the program I get a segmentation fault. After using debug50, it tells me that the segmentation fault has something to do with the hash table, or the array of a custom data type called node. Debug50 tells me that it happens when i try to run an if statement, which is "if (table[hashnum]->next == NULL)", which is in the 'load' function.

I've looked online as to why the seg fault could be happening, but from what I understand, it happens when I access freed pointers, when i dont have enough space, when i try to access memory i'm not allowed to access, or try to write in a 'read only' part of memory. Also, from what I understand about initializing global arrays, the one made in the beginning of the program should allow me to read and write in it, so I'm not sure what i'm doing wrong.

Any help and explanations are appreciated, thanks :)

Below is my code.

Also, the guidelines and goal of the problem set are found at https://cs50.harvard.edu/x/2023/psets/5/speller/.

// Implements a dictionary's functionality

#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <strings.h>

#include "dictionary.h"

// Represents a node in a hash table
typedef struct node
{
    char word[LENGTH + 1];
    struct node *next;
}
node;

// TODO: Choose number of buckets in hash table
const unsigned int N = 26;

// Hash table
node *table[N];

// Global Variables
char tmp[LENGTH + 1];
int wordcount = 0;
bool hashnull = false;

// New function prototype
void freehash(node* node);

// Returns true if word is in dictionary, else false
bool check(const char *word)
{
    // Runs word through hash function
    int hashnum = hash(word);
    node * tmpnode = table[hashnum]->next;
    // Compares all words in linked list given by hashnum to 'word' to see if any match/if 'word' is spelled correctly
    while (strcasecmp(word, tmpnode->word) != 0)
    {
        if (tmpnode->next == NULL)
        {
            return false;
        }
        tmpnode = tmpnode->next;
    }

    return true;
}

// Hashes word to a number
unsigned int hash(const char *word)
{
    // TODO: Improve this hash function
    return toupper(word[0]) - 'A';
}

// Loads dictionary into memory, returning true if successful, else false
bool load(const char *dictionary)
{
    // Opens dictionary
    FILE *dict  = fopen(dictionary, "r");
    // Checks if dictionary opened successfully
    if (dict == NULL)
    {
        fclose(dict);
        printf("\n\n Could not load dictionary.\n");
        return false;
    }
    // Load dictionary into hash table
    node *tmpnode;
    int hashnum;
    char check;
    while(fread(&check, sizeof(char), 1, dict))
    {
        fseek(dict, -1, SEEK_CUR);
        //// Take a single word from dictionary and put into dictword (new node)
        // Make new node/element to add to hash table
        node *dictword = malloc(sizeof(node));
        if (dictword == NULL)
        {
            fclose(dict);
            free(dictword);
            return false;
        }
        tmpnode = dictword;

        // Iterates through all of the letters in a word from dictionary and ends when next line (/n) is read
        int tmpctr = 0;
        while(fread(&dictword->word[tmpctr], sizeof(char), 1, dict))
        {
            if(dictword->word[tmpctr] == '\n')
            {
                break;
            }
            else
            {
                tmpctr++;
            }
        }
        // Run hash function to see where in hash table new word will go
        hashnum = hash(dictword->word);
        //// Put new node for current word into hash table
        if (table[hashnum]->next == NULL)
        {
            table[hashnum]->next = tmpnode;
            dictword->next = NULL;
        }
        else
        {
            tmpnode = table[hashnum]->next;
            dictword->next = tmpnode;
            table[hashnum]->next = dictword;

        }

        // Increase wordcount
        wordcount++;
    }
    // Close dictionary stream (after success)
    fclose(dict);
    // ***** NO NEED TO MALLOC HERE; THATS WHAT THE UNLOAD FUNCTION IS FOR
    return true;
}

// Returns number of words in dictionary if loaded, else 0 if not yet loaded
unsigned int size(void)
{
    if (wordcount > 0)
    {
        return wordcount;
    }
    else
    {
        return 0;
    }
}

// Unloads dictionary from memory, returning true if successful, else false
bool unload(void)
{
    node * tmpnode;
    for(int i = 0; i < N; i++)
    {
        tmpnode = table[i]->next;
        freehash(tmpnode);
    }
    return true;
}

void freehash(node* node)
{
    // If node currently being 'freed' isn't the last node, call freehash() with next node in line
    if (node->next != NULL)
    {
        freehash(node->next);
    }

    free(node);

    return;
}

If you want to read a line from a file, use `fgets()` instead of your own loop. See https://stackoverflow.com/questions/2693776/removing-trailing-newline-character-from-fgets-input for how to remove the newline. — Barmar, Feb 13 '23 at 17:34
You're not adding a null terminator to `dictword->word`, so you get undefined behavior, and this is likely causing the error. If you used `fgets()` this would be done automatically. — Barmar, Feb 13 '23 at 17:34
Please edit your question to supply [mre]. Avoid global variables whenever you can. — Allan Wind, Feb 13 '23 at 17:48
`fclose(NULL)` will segfault so don't call fclose() in that case. — Allan Wind, Feb 13 '23 at 17:52

Allan Wind · Answer 1 · 2023-02-13T19:41:09.057

load(): If fopen() fails you fclose(NULL) which segfaults.
load(): As node *table[N] is a global variable it is zero initialized. In load() you do table[hashnum]->next which segfaults as table[hashnum] is NULL. Maybe you want:

        if (!table[hashnum]) {
            table[hashnum] = tmpnode;
        } else if (table[hashnum]->next) {

As aside, minimize the scope of tmpnode to just where it's needed (which is the case where ->next is set). This makes your code easier to read.

load(): You currently include the '\n' of the file but you probably shouldn't and instead want to NUL terminate your string instead:

        size_t i = 0;
        for(; i < LENGTH && fread(&dictword->word[i], 1, 1, dict) && dictword->word[i] != '\n'; i++);
        dictword->word[i] = '\0';

load(): The file reading logic is kinda odd, read one byte, then you back up the file pointer then read letter by letter without checking if word is too long. As @Barmar suggest, just get a line with fgets(), then use strcspn() to replace the \n with a \0.

score 0 · Answer 2 · answered Feb 13 '23 at 18:59

0

Code also has subtle bugs beyond CS50 expectations.

OP hashed with return toupper(word[0]) - 'A';

This should be done as return toupper(((unsigned char *)word)[0]) - 'A'; as char may be signed with word[0] < 0 and toupper(int) is defined for unsigned char values and EOF. Characters should be examined as if there are unsigned char even when char is signed.
strcasecmp(), although not a standard function, more often converts to lower (than topper()) and then compares. When the case mapping is not 1-to-1, e.g. toupper() maps ÿ and y to Y, but tolower() maps Y to y. The hash with toupper will not return 0 with strcasecmp("ÿ","y"). Best to use the same case to-ness.

answered Feb 13 '23 at 18:59

chux - Reinstate Monica

143,097
13
135
256

I appreciate the little tidbit about how `strcasecmp()` is implemented. If you are using utf8 then maybe looking at the first byte is not anyways? – Allan Wind Feb 13 '23 at 19:43
1

@AllanWind Once we mix utf8, C and case, we are venturing into new ground. Hopefully it is implemented right. – chux - Reinstate Monica Feb 13 '23 at 20:03

CS50 speller.c segmentation fault with hash table in load function

2 Answers2