Problems in the original code
One possible source of problems is:
char *doc_names[argc - 2];
int i;
for(i = 0; i < argc; i++){
doc_names[i] = argv[i];
}
You allocate space for argc-2
pointers and proceed to copy argc
pointers into that space. That's a buffer overflow (in this case, a stack overflow too). It can easily cause the trouble. A plausible fix is:
for (i = 2; i < argv; i++)
doc_names[i-2] = argv[i];
However, you really don't need to copy the argument list; you can just process the arguments from index 2 to the end. I note that the code shown doesn't actually use doc_names
, but the out-of-bounds assignment can still cause trouble.
You are not allocating space to read a word into, nor allocating new space for each stop word, nor do you ensure that you do not overflow the bounds of the array in which you're storing the words.
Consider using:
enum { MAX_STOP_WORDS = 50 };
char *stopWords[MAX_STOP_WORDS];
int word_counter = 0;
FILE *fp = fopen(stop_list_name, "r");
if (fp != NULL)
{
char word[64];
for (i = 0; i < MAX_STOP_WORDS && fscanf(fp, "%63s", word) == 1; i++)
stopWords[i] = strdup(word);
word_counter = i;
fclose(fp);
}
This diagnosed problem is definitely a plausible cause of your crash. I used i
(declared earlier in the code) in the loop because word_counter
makes the loop control line too long for SO.
Strictly, strdup()
is not a part of standard C, but it is a part of POSIX. If you don't have POSIX, you can write your own:
#include <stdlib.h>
#include <string.h>
char *strdup(const char *str)
{
size_t len = strlen(str) + 1;
char *result = malloc(len);
if (result != 0)
memmove(result, str, len);
return result;
}
You also have some other bad practices on display:
Problems in the amended code
There's one important and a couple of very minor problems in the amended code:
Your loop that prints the stop words depends on a null pointer (curiously spelled as '\0'
— it is a valid but unconventional spelling for a null pointer), but the initialization code doesn't set a null pointer.
There are (at least) two options for fixing that:
Add a null pointer:
for (i = 0; i < MAX_STOP_WORDS-1 && fscanf(fp, "%63s", word) == 1; i++)
stopWords[i] = strdup(word);
stopWords[i] = 0;
fclose(fp);
}
for (i = 0; stopWords[i] != '\0'; i++)
printf("%s\n", stopWords[i]);
Note that the upper bound is now MAX_STOP_WORDS - 1
.
Or you can use wordCount
instead of a condition:
for (i = 0; i < wordCount; i++)
printf("%s\n", stopWords[i]);
I'd choose the second option.
One reason for doing that is it avoids warnings about wordCount
being set and not used — a minor problem.
And doc_names
is also set but not used.
I worry about those because my default compiler options generate errors for unused variables — so the code doesn't compile until I fix it. That leads to:
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[])
{
if (argc < 3)
{
fprintf(stderr, "Usage: %s stop-words docfile ...\n", argv[0]);
return 1;
}
char *stop_list_name = argv[1];
char *doc_names[argc - 2];
int i;
for (i = 2; i < argc; i++)
{
doc_names[i - 2] = argv[i];
}
int doc_count = argc - 2;
// create the array of stop words
enum { MAX_STOP_WORDS = 50 };
char *stopWords[MAX_STOP_WORDS];
int word_counter = 0;
FILE *fp = fopen(stop_list_name, "r");
if (fp != NULL)
{
char word[64];
int i;
for (i = 0; i < MAX_STOP_WORDS && fscanf(fp, "%63s", word) == 1; i++)
stopWords[i] = strdup(word);
word_counter = i;
fclose(fp);
}
for (i = 0; i < word_counter; i++)
printf("stop word %d: %s\n", i, stopWords[i]);
for (i = 0; i < doc_count; i++)
printf("document %d: %s\n", i, doc_names[i]);
return 0;
}
And, given a stop words file containing:
help
able
may
can
it
should
do
antonym
prozac
and compiling it (source file sw19.c
, program sw19
) with:
$ gcc -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes \
> -Wold-style-definition -Werror sw19.c -o sw19
and running it as:
$ ./sw19 stopwords /dev/null
stop word 0: help
stop word 1: able
stop word 2: may
stop word 3: can
stop word 4: it
stop word 5: should
stop word 6: do
stop word 7: antonym
stop word 8: prozac
document 0: /dev/null
$