The first problem is you only allocated space for a list of words (ie. character pointers) but you didn't allocate space for the words themselves.
char** words = (char**)malloc(20 * sizeof(char*));
This allocates space for 20 character pointers and assigns it to words
. Now words[i]
has space for a character pointer but not for the characters.
words[i]
contains garbage, because malloc
does not initialize memory. When you pass it into fscanf
, fscanf
tries to use the garbage in words[i]
as a memory location to write characters to. That's either going to corrupt some memory in the program, or more likely it tries to read a memory location is isn't allowed to and crashes. Either way, it's not good.
You have to allocate memory for the string, pass that to fscanf
, and finally put that string into words[i]
.
char** words = malloc(numWords * sizeof(char*));
for(i = 0; i < numWords; i++) {
char *word = malloc(40 * sizeof(char));
fscanf(ifp, "%39s", word);
words[i] = word;
printf("%s\n", words[i]);
}
Note that I didn't cast the result of malloc
, that's generally considered unnecessary.
Also note that I allocated space for numWords
in the list. Your original only allocates space for 20 words, once it goes over that it'll start overwriting allocated memory and probably crash. As a rule of thumb, avoid constant memory allocations. Get used to dynamic memory allocation as quickly as you can.
Also note that I limited how many characters fscanf
is allowed to read to the size of my buffer (minus one because of the null byte at the end of strings). Otherwise if your word list contained "Pneumonoultramicroscopicsilicovolcanoconiosis", 45 characters, it would overrun the word
buffer and start scribbling on adjacent elements and that would be bad.
This leads to a new problem that are common to fscanf
and scanf
: partial reads. When the code above encounters "Pneumonoultramicroscopicsilicovolcanoconiosis" fscanf(ifp, "%39s", word);
will read in the first 39 characters, "Pneumonoultramicroscopicsilicovolcanoco" and stop. The next call to fscanf
will read "niosis". You'll store and print them as if they were two words. That's no good.
You could solve this by making the word buffer bigger, but now most words will be wasting a lot of memory.
scanf
and fscanf
have a whole lot of problems and are best avoided. Instead, it's best to read entire lines and parse them with sscanf
. In this case you don't need to do any parsing, they're just strings, so getting the line will suffice.
fgets
is the usual way to read a line, but that also requires that you try and guess how much memory you'll need to read in the line. To mitigate that, have a large line buffer, and copy the words out of it.
void strip_newline( char* string ) {
size_t len = strlen(string);
if( string[len-1] == '\n' ) {
string[len-1] = '\0';
}
}
...
int i;
/* The word list */
char** words = malloc(numWords * sizeof(char*));
/* The line buffer */
char *line = malloc(1024 * sizeof(char*));
for(i = 0; i < numWords; i++) {
/* Read into the line buffer */
fgets(line, 1024, ifp);
/* Strip the newline off, fgets() doesn't do that */
strip_newline(line);
/* Copy the line into words */
words[i] = strdup(line);
printf("%s\n", words[i]);
}
strdup
won't copy all 1024 bytes, just enough for the word. This will result in using only the memory you need.
Making assumptions about files, like that they'll have a certain number of lines, is a recipe for problems. Even if the file says it contains a certain number of lines you should still verify that. Otherwise you'll get bizarre errors as you try to read past the end of the file. In this case, if the file has less than numWords
it'll try to read garbage and probably crash. Instead, you should read the file until there's no more lines.
Normally this is done by checking the return value of fgets
in a while loop.
int i;
for( i = 0; fgets(line, 1024, ifp) != NULL; i++ ) {
strip_newline(line);
words[i] = strdup(line);
printf("%s\n", words[i]);
}
This brings up a new problem, how do we know how big to make words
? You don't. This brings us to growing and reallocating memory. This answer is getting very long, so I'll just sketch it.
char **readDictionary(FILE *ifp) {
/* Allocate a decent initial size for the list */
size_t list_size = 256;
char** words = malloc(list_size * sizeof(char*));
char *line = malloc(1024 * sizeof(char*));
size_t i;
for( i = 0; fgets(line, 1024, ifp) != NULL; i++ ) {
strip_newline(line);
/* If we're about to overflow the list, double its size */
if( i > list_size - 1 ) {
list_size *= 2;
words = realloc( words, list_size * sizeof(char*));
}
words[i] = strdup(line);
}
/* Null terminate the list so readers know when to stop */
words[i] = NULL;
return words;
}
int main() {
FILE *fp = fopen("/usr/share/dict/words", "r");
char **words = readDictionary(fp);
for( int i = 0; words[i] != NULL; i++ ) {
printf("%s\n", words[i]);
}
}
Now the list will start at size 256 and grow as needed. Doubling grows pretty fast without wasting too much memory. My /usr/share/dict/words has 235886 lines in it. That can be stored in 218 or 262144. 256 is 28 so it only requires 10 expensive calls to realloc
to grow to the necessary size.
I've changed it to return the list, because there isn't much good in building the list if you're just going to use it immediately. This allows me to demonstrate another technique in working with dynamically sized lists, null termination. The last element in the list is set to NULL
so anyone reading the list knows when to stop. This is safer and simpler than trying to pass the length around with the list.
That was a lot, but that's all the basic stuff you need to do when working with files in C. It's good to do it manually, but fortunately there are libraries out there which make doing this sort of thing a lot easier. For example, Gnome Lib provides a lot of basic functionality including arrays of pointers that automatically grow as needed.