The problem
You did memory allocation fundamentally wrong.
char **stop_words = (char**)malloc(1000*sizeof(char*));
only allocates a block of memory that is capable to store 1000 pointers.
The content of stop_words[0]
to stop_words[999]
is undefined, they are all garbage value after the malloc()
returns.
Sometimes it looks fine to write to stop_words[i]
, but it is just a lucky part that the garbage is a pointer to mapped memory (still bad though, you probably have a memory corruption because of that).
The fix for this is simply to allocate another block of memory to contain the data from your file.
Wrong target buffer
This part
fscanf(fp,"%s\n", &stop_words[i]);
writes to array of pointer that you have allocated with malloc()
. The type of expression &stop_words[i]
itself doesn't match with %s
, you should really activate warn flags and a good compiler should warn you about that by default.
Potential buffer overflow
Your method to read a line is dangerous, because fscanf
with %s
doesn't care about how big your buffer is, and your program is vulnerable to buffer overflow because of that.
Fix for this is that you can use fgets
and specify the size of your buffer.
You can then realloc()
if a line has more than allocated memory for the buffer. To detect this, you can see the last character returned. If it is a line feed, then it is the end of line, otherwise it may be end of file or a line that has characters more than buffer size (so you can decide to realloc).
Fix for this
englishstopwords.txt (sample file for testing)
i
me
my
myself
we
our
test_long_line_123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123
ours
ourselves
test.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>
#define MAX_WORDS (1000u)
#define INIT_ALLOC (128u)
int main(void)
{
size_t i, total_words;
FILE *fp;
char **stop_words = malloc(MAX_WORDS * sizeof(*stop_words));
/* TODO: Handle `stop_words == NULL` */
fp = fopen("englishstopwords.txt", "r");
/* TODO: Handle `fp == NULL` */
i = 0;
while (true) {
size_t len = 0;
char *ret, *buf = malloc(INIT_ALLOC * sizeof(*buf));
/* TODO: Handle `buf == NULL` */
ret = buf;
re_fgets:
ret = fgets(ret, INIT_ALLOC, fp);
if (ret == NULL) {
/* We've reached the end of file */
if (len == 0) {
/*
* Throw away the buffer, this is unused
*/
free(buf);
} else {
/* Last line buffer. */
stop_words[i++] = buf;
}
break;
}
len = strlen(buf);
if (buf[len - 1] != '\n') {
/*
*
* We don't see an LF, this means this line
* has more than `INIT_ALLOC` characters or
* it may be the EOF.
*
*/
ret = realloc(buf, (len + 1 + INIT_ALLOC) * sizeof(*buf));
/* TODO: Handle `ret == NULL` */
buf = ret;
/*
* Shift the pointer to the right (end of string).
* Because this line has not been fully read.
*
* We put the next `fgets` buffer to the end of this
* string.
*/
ret += len;
goto re_fgets;
}
/* TODO: Trim CR on Windows platform */
/* Trim the LF */
buf[len - 1] = '\0';
stop_words[i++] = buf;
if (i >= MAX_WORDS) {
/*
* TODO: You can do realloc(stop_words, ...) if you
* want to.
*/
break;
}
}
fclose(fp);
total_words = i;
for (i = 0; i < total_words; i++)
printf("%s\n", stop_words[i]);
for (i = 0; i < total_words; i++)
free(stop_words[i]);
free(stop_words);
return 0;
}
Compile and Run
ammarfaizi2@integral:/tmp$ cat englishstopwords.txt
i
me
my
myself
we
our
test_long_line_123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123
ours
ourselves
ammarfaizi2@integral:/tmp$ gcc -ggdb3 -Wall -Wextra -pedantic-errors test.c -o test
ammarfaizi2@integral:/tmp$ valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes --track-fds=yes --error-exitcode=99 -s ./test
==503906== Memcheck, a memory error detector
==503906== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==503906== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==503906== Command: ./test
==503906==
i
me
my
myself
we
our
test_long_line_123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123123
ours
ourselves
==503906==
==503906== FILE DESCRIPTORS: 3 open (3 std) at exit.
==503906==
==503906== HEAP SUMMARY:
==503906== in use at exit: 0 bytes in 0 blocks
==503906== total heap usage: 22 allocs, 22 frees, 20,476 bytes allocated
==503906==
==503906== All heap blocks were freed -- no leaks are possible
==503906==
==503906== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
ammarfaizi2@integral:/tmp$