1

I've just begun learning the C language and I ran into an issue with one of my programs.

I am getting an error: "Illegal instruction 4" when executing: ./dictionary large.txt

Large.txt is a file with 143091 alphabetically sorted words, with each word starting on a new line. I am trying to load all of them into a hash table and return true if all the words are loaded successfully.

This code works for me if the code in bool load() is within int main and load() is non-existent. However, once I place it inside the load() function and call it from main, I get an error.

I would appreciate help on this, as there are not many threads on Illegal instruction.

This is my code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <stdbool.h>

// Maximum length for a word
// (e.g., pneumonoultramicroscopicsilicovolcanoconiosis)
#define LENGTH 45

// Number of letters in the english alphabet
#define ALPHABET_LENGTH 26

// Default dictionary
#define DICTIONARY "large.txt"

// Represents a node in a hash table
typedef struct node
{
    char word[LENGTH + 1];
    struct node *next;
} node;

// Number of buckets in hash table
const unsigned int N = ALPHABET_LENGTH;

// Hash table
node *table[N];

// Load function
bool load(char *dictionary);

// Hash function
int hash(char *word);

int main(int argc, char *argv[])
{
    // Check for correct number of args
    if (argc != 2 && argc != 3)
    {
        printf("Usage: ./speller [DICTIONARY] text\n");
        exit(1);
    }

    // Determine which dictionary to use
    char *dictionary = (argc == 3) ? argv[1] : DICTIONARY;

    bool loaded = load(dictionary);

    // TODO: free hashtable from memory

    return 0;
}

bool load(char *dictionary)
{
    // Open dictionary for reading
    FILE *file = fopen(dictionary, "r");
    if (file == NULL)
    {
        printf("Error 2: could not open %s. Please call customer service.\n", dictionary);
        exit(2);
    }

    // Initialize array to NULL
    for (int i = 0; i < N; i++)
        table[i] = NULL;

    // Declare and initialize variables
    unsigned int char_count = 0;
    unsigned int word_count = 0;
    char char_buffer;
    char word_buffer[LENGTH + 1];
    int hash_code = 0;
    int previous_hash_code = 0;

    // Declare pointers
    struct node *first_item;
    struct node *current_item;
    struct node *new_item;

    // Is true the first time the while loop is ran to be able to distinguish between hash_code and previous_hash_code after one loop
    bool first_loop = true;

    // Count the number of words in dictionary
    while (fread(&char_buffer, sizeof(char), 1, file))
    {

        // Builds the word_buffer by scanning characters
        if (char_buffer != '\n')
        {
            word_buffer[char_count] = char_buffer;

            char_count++;
        }
        else
        {
            // Increases word count each time char_buffer == '\n'
            word_count += 1;

            // Calls the hash function and stores its value in hash_code
            hash_code = hash(&word_buffer[0]);

            // Creates and initializes first node in a given table index
            if (hash_code != previous_hash_code || first_loop == true)
            {
                first_item = table[hash_code] = (struct node *)malloc(sizeof(node));
                if (first_item == NULL)
                {
                    printf("Error 3: memory not allocated. Please call customer service.\n");
                    return false;
                }

                current_item = first_item;
                strcpy(current_item->word, word_buffer);
                current_item->next = NULL;
            }
            else
            {
                new_item = current_item->next = (struct node *)malloc(sizeof(node));
                if (new_item == NULL)
                {
                    printf("Error 4: memory not allocated. Please call customer service.\n");
                    return false;
                }

                current_item = new_item;
                strcpy(current_item->word, word_buffer);
                current_item->next = NULL;
            }

            // Fills word buffer elements with '\0'
            for (int i = 0; i < char_count; i++)
            {
                word_buffer[i] = '\0';
            }

            // Signals the first loop has finished.
            first_loop = false;

            // Clears character buffer to keep track of next word
            char_count = 0;

            // Keeps track if a new table index should be initialized
            previous_hash_code = hash_code;
        }
    }

    return true;
}

// Hash in order of: 'a' is 0 and 'z' is 25
int hash(char *word_buffer)
{
    int hash = word_buffer[0] - 97;

    return hash;
}

Thank you in advance!

Chris

KLM54W
  • 11
  • 2
  • Use a debugger. At a minimum it will instantly give you the exact line of code that triggers the crash. – kaylum Feb 21 '21 at 10:58
  • Thanks kaylum! Which debugger are you using? – KLM54W Feb 21 '21 at 11:01
  • Depends what OS I'm coding for at the time. `gdb` on Linux, Visual Studio on Windows and XCode on MacOS. – kaylum Feb 21 '21 at 11:02
  • 1
    `strcpy(current_item->word, word_buffer);` Doesn't look like `word_buffer` is correctly NUL terminated for at least the first word. Which means the `strcpy` can overflow one or both buffers. Should init the buffer to all zeroes: `char word_buffer[LENGTH + 1] = {0};` or ensure a NUL is added at the end of the word. – kaylum Feb 21 '21 at 11:06
  • Kaylum, I added char word_buffer[LENGTH + 1] = {0}; and it worked! Thank you! – KLM54W Feb 21 '21 at 11:14
  • 2
    Welcome to stackoverflow :). You've got the answer to your problem, but I think your question doesn't follow the guidelines of stackoverflow: you've included a large amount of code, and your question amounts to "someone please debug my code for me". A good question for the stackoverflow format includes a more specific problem where you've reduced the code down to a _minimal_ reproducible example. – Paul Hankin Feb 21 '21 at 11:35
  • Thank you for pointing this out Paul! My next question will be more specific :) – KLM54W Feb 21 '21 at 12:26

1 Answers1

0

You should use node *table[ALPHABET_LENGTH]; for the table declaration instead of node *table[N];

There is a difference between constant macros and const variables, a macro can be used in a constant expression, such as a global array bound as per your use case, whereas a const variable cannot.

As you can see here, the compiler you say you are using, gcc, with no compiler flags, issues an error message:

error: variably modified 'table' at file scope

You can read more about these differences and use cases in "static const" vs "#define" vs "enum" it has more subjects, like static and enum, but is a nice read to grasp the differences between these concepts.

anastaciu
  • 23,467
  • 7
  • 28
  • 53
  • I added char word_buffer[LENGTH + 1] = {0}; which fixed the error. Should I still change the size of node[N] to node[ALPHABET_LENGTH]? – KLM54W Feb 21 '21 at 11:19
  • @KLM54W did it really? Where did you add that? – anastaciu Feb 21 '21 at 11:22
  • Kaylum pointed this out in the comments. // Declare and initialize variables unsigned int char_count = 0; unsigned int word_count = 0; char char_buffer; char word_buffer[LENGTH + 1] = {0}; int hash_code = 0; int previous_hash_code = 0; – KLM54W Feb 21 '21 at 11:25
  • 2
    The question isn't about C++, but the declaration of `node *table[N];` is an error in C (size of static arrays must be integer constant expressions, which doesn't include `const int` variables. However many compilers (eg: gcc, clang) are tolerant of this and use the correct value of `N` to size the array. – Paul Hankin Feb 21 '21 at 11:31
  • Paul, this part of the code is actually from Harvard 's CS50x. I'm using the gcc compiler, as you've mentioned. Thanks! – KLM54W Feb 21 '21 at 11:32
  • 1
    Perhaps ask your teacher which C standard the course teaches, because for example gcc reports an error with all official C standards. For example, the latest one: `gcc -std=c17 xx.c` gives `xx.c:28:7: error: variably modified ‘table’ at file scope`. The default with gcc is to use a gcc-specific dialect of c17, and I don't see any reason why Harvard would teach that rather than standard C. – Paul Hankin Feb 21 '21 at 11:46