3

Goal:

  1. Make global array of string (dictionary) given that size can be computed only in function load().
  2. Print dictionary on the screen with function print().

My approach:

Create global pointer to string, create array of strings in load() and assign local array to global pointer.

Problem:

If I try to print global array (and local as well) inside load(), everything's fine, but in case of printing with print(), segfault occurs somewhere in the end of array. GDB and valgrind outputs seem cryptic to me. I give up. What's wrong?

Source and dictionary are here.

Code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

// length of the longest word in dictionary
#define LENGTH 45
// dictionary file
#define DICTIONARY "large"

// prototypes
void load(const char* dictionary);
void print(void);

// global dictionary size
int dict_size = 0;
// global dictionary
char **global_dict;

int main(void)
{
    load(DICTIONARY);
    print();
    return 0;  
}

/**
 * Loads dictionary into memory.
 */
void load(const char* dictionary)
{
    // open dictionary file
    FILE *dict_file = fopen(dictionary, "r");    
    
    // compute size of dictionary
    for (int c = fgetc(dict_file); c != EOF; c = fgetc(dict_file))
    {
        // look for '\n' (one '\n' means one word)
        if (c == '\n')
        {
            dict_size++;
        }
    }
    
    // return to beginning of file
    fseek(dict_file, 0, SEEK_SET);
    
    // local array
    char *dict[dict_size];
    
    // variables for reading
    int word_length = 0;
    int dict_index = 0;
    char word[LENGTH + 1];   
    
    // iteration over characters
    for (int c = fgetc(dict_file); c != EOF; c = fgetc(dict_file))
    {
        // allow only letters
        if (c != '\n')
        {
            // append character to word
            word[word_length] = c;
            word_length++;
        }
        // if c = \n and some letters're already in the word
        else if (word_length > 0)
        {
            // terminate current word
            word[word_length] = '\0';
            
            //write word to local dictionary
            dict[dict_index] = malloc(word_length + 1);
            strcpy(dict[dict_index], word);
            dict_index++;
            
            // prepare for next word
            word_length = 0;
        }
    }
    
    // make local dictioinary global
    global_dict = dict;
}

/**
 * Prints dictionary.
 */
void print(void)
{
    for (int i = 0; i < dict_size; i++)
        printf("%s %p\n", global_dict[i], global_dict[i]);
}
Community
  • 1
  • 1
aikrikunov95
  • 307
  • 4
  • 13

3 Answers3

6

Answer is simple, you are assigning the pointer to a variable that is local to load() and it's deallocated when load() returns, hence it's invalid in print(), that leads to undefined behavior.

You even commented it

// local array <-- this is your comment not mine
char *dict[dict_size];

You have two options:

  1. Don't use a global variable, the pattern you used with a global variable has no benefits whatsoever, instead it's very dangerous. You could return the dynamically allocated poitner from the function load() and then pass it to print().

  2. Allocate the array of pointers using malloc().

    global_dict = malloc(dict_size * sizeof(*global_dict));
    

Why I don't like global variables?

  • Because you can do what you did in your program without even recieving a warning from the compiler.

But of course, you will not do this kind of thing when you gain experience, so it's more your fault than it is of global variables, but It's common to see programmers that are still learning, use global variables to solve the problem of sharing data among functions, which is what parameters are for.

So using global variables + not knowing how to handle them properly is bad, instead learn about function parameters and you will solve every problem that required global variables to pass data through different functions in your program without using global variables.

This is your own code, I removed the global_dict variable, and used dynamic memory allocation inside load(), also I performed some error checking on malloc(), you should improve that part if you want the code to be robust, the rest is self explanatory


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

// length of the longest word in dictionary
#define LENGTH 45
// dictionary file
#define DICTIONARY "large"

// prototypes
char **load(const char *dictionary);
void print(char **);

int main(void)
{
    char **dictionary;

    dictionary = load(DICTIONARY);
    if (dictionary == NULL)
        return -1;
    print(dictionary);

    /* Don't forget to free resources, 
       you might need to do it while
       the program is still running,
       leaked resources might quickly 
       become a problem.
     */
    for (int i = 0 ; dictionary[i] != NULL ; ++i)
        free(dictionary[i]);
    free(dictionary);

    return 0;  
}
/**
 * Loads dictionary into memory.
 */
char **load(const char *dictionary)
{
    // open dictionary file
    FILE  *dict_file;    
    size_t dict_size;
    char **dict;
    char   word[LENGTH + 1];
    size_t word_length;
    size_t dict_index;
    dict_file = fopen(dictionary, "r");
    if (dict_file == NULL) /* you should be able to notify this */
        return NULL; /* failure to open file */
    // compute size of dictionary
    for (int c = fgetc(dict_file); c != EOF; c = fgetc(dict_file))
    {
        // look for '\n' (one '\n' means one word)
        if (c == '\n')
        {
            dict_size++;
        }
    }
    // return to beginning of file
    fseek(dict_file, 0, SEEK_SET);

    // local array
    dict = malloc((1 + dict_size) * sizeof(*dict));
    /*                    ^ add a sentinel to avoid storing the number of words */
    if (dict == NULL)
        return NULL;

    // variables for reading
    word_length = 0;
    dict_index = 0;
    // iteration over characters
    for (int c = fgetc(dict_file); c != EOF; c = fgetc(dict_file))
    {
        // allow only letters
        if (c != '\n')
        {
            // append character to word
            word[word_length] = c;
            word_length++;
        }
        // if c = \n and some letters're already in the word
        else if (word_length > 0)
        {
            // terminate current word
            word[word_length] = '\0';

            //write word to local dictionary
            dict[dict_index] = malloc(word_length + 1);
            if (dict[dict_index] != NULL)
            {
                strcpy(dict[dict_index], word);
                dict_index++;
            }
            // prepare for next word
            word_length = 0;
        }
    }
    dict[dict_index] = NULL;
    /* We put a sentinel here so that we can find the last word ... */
    return dict;
}

/**
 * Prints dictionary.
 */
void print(char **dict)
{
    for (int i = 0 ; dict[i] != NULL ; i++)
        printf("%s %p\n", dict[i], (void *) dict[i]);
}
Iharob Al Asimi
  • 52,653
  • 6
  • 59
  • 97
  • Disagree, you can do the same thing with or without global variables. In fact, most of the time when you see this problem, it's from returning a pointer to a local variable. – user253751 Jun 29 '15 at 11:00
  • Yes you can, but when I thought of compiler warnings then I realized that perhaps the compiler wont issue a warning in this case, whereas it certainly does trigger a warning when you return the address of a local variable, I never had the need for a global variable so I think they are useful only in exceptional situations. – Iharob Al Asimi Jun 29 '15 at 11:02
  • 1
    A second problem is that `char *dict[dict_size]` can blow the local stack. – Jongware Jun 29 '15 at 11:18
  • Good point, I didn't think of how arbitrarily big `dict_size` can be. – Iharob Al Asimi Jun 29 '15 at 11:24
  • I forgot one constraint: real challange is perfoming spell-checking of text, so it's not relevant to load dictionary in memory every time I want to spell-check a word (passing it to other functions as an argument). That's a main reason why I decided to create some "global stuff" - the dictionary must play a role of database. – aikrikunov95 Jun 29 '15 at 19:30
  • I don't understand. How can I apply lifetime of pointers concept to the problem? Not to touch local array somehow and access to it's elements ... what way? – aikrikunov95 Jun 29 '15 at 19:47
  • 2
    No, when you create a pointer with `malloc()` it's valid until you call `free()` on it, so you will have the data available thoughout the program for the same time the program lives. – Iharob Al Asimi Jun 29 '15 at 19:56
1

In load() you create and fill a local variable char *dict[dict_size]; and then you only assign the pointer global_dict to that variable. But as soon as load() returns, you must not access any local variables any more. That's stealing the hotel keys as it is greatly explained in that answer

Community
  • 1
  • 1
Ingo Leonhardt
  • 9,435
  • 2
  • 24
  • 33
  • So, almost complete output is just an accident... Got it. – aikrikunov95 Jun 29 '15 at 19:32
  • the code passes the contents of dict, not the address of dict. So the contents are not lost when load() exits. Unfortunately, only the first address is passed, not the whole array that is the actual contents of dict. and since the dict array was not malloc'd, all the rest of the array is lost – user3629249 Jun 30 '15 at 01:40
  • Found counter-evidence: every element of `dict` is `malloc`'d near the end of `load()`, so all addresses must be on place until the end of the program and if I start with a correct one (`global_dict = dict`), all others should also work and they almost work until the end. Why not until very end? – aikrikunov95 Jun 30 '15 at 09:28
  • 1
    It's right that every element of dict is `malloc`'d and so theoretically they are not lost. But `dict` itself being the array of pointers to the elements addresses is lost and so you have no defined way to access the allocated memory after `load()` has finished. It may happen that the part of the stack where they have been stored is not yet overwritten and so your program seems to work correctly but you must not assume that – Ingo Leonhardt Jun 30 '15 at 12:18
  • So I can't lose target elements (strings) because they're `malloc`'d, but I can lose `dict`'s elements (pointers to `malloc`'d space) and thus get an undefined data while using array pointer arithmetic: (`*(ptr + i)` or `aray[]`) even if I start with right pointer (`dict` or `dict[0]`). – aikrikunov95 Jun 30 '15 at 13:22
  • 1
    correct; and even for `dict[0]` it's not guaranteed to have the correct value after leaving `load()` – Ingo Leonhardt Jun 30 '15 at 13:29
0

Problem explained (inspired by @Ingo Leonhardt)

Problem here:

char *dict[dict_size];

With such a declaration memory for array was allocated automatically (it's untouchable only in load()) and after load() call dict's memory is accessible for overwriting. It seems that overwriting takes place somewhere in the end of the program (accidentally).

Solution (inspired by @iharob)

char **dict = malloc((1 + dict_size) * sizeof(*dict));

Thus I dynamically allocate memory for dict (untouchable until end of program or free(global_dict))

P.S. I agree that global variables're dangerous, but solving problem with such a design's a constraint of the assignment.

aikrikunov95
  • 307
  • 4
  • 13