If you truly want to read an unknown number of characters from an unknown number of lines and store those lines in an array (or, actually, in an object created from a pointer-to-pointer-to-char), then you have a number of options. POSIX getline
is a line oriented input function (like fgets
) which will read a line of text from the give file each time it is called, and will allocate sufficient storage to hold the line regardless of the length. (as a bonus getline
returns the actual number of characters read, eliminating a subsequent call to strlen
if the length is needed)
getline
eliminates the need for repeated checks on whether fgets
actually read the whole line, or just a partial. Further, if your lines are more than a few characters long, the buffered read provided by getline
(and fgets
) is quite a bit faster than character oriented input (e.g. fgetc
). Don't get me wrong, there is nothing wrong with fgetc
, and if your files are small and your lines short, you are not going to notice any difference. However, if you are reading a million lines of 500,000 chars each -- you will notice a significant difference.
As for an array, since you don't know how many lines you will read, you really need a pointer-to-pointer-to-char (e.g a double-ponter, char **array
) so you can allocate some reasonable number of pointers to begin with, allocate and assign the lines to individual pointer until your limit is reached, then realloc
array to increase the number of pointers available, and keep on reading/storing lines.
As with any code that dynamically allocates memory, your must (1) preserve a pointer to each block of memory allocated, so (2) the memory can be freed with no longer in use. You should also validate each allocation (and reallocation) to insure the allocations succeed. When using realloc
, always use a temporary pointer so you can validate that realloc
succeeds before assigning the new block to the original pointer. If you don't, and realloc
fails, you have lost the pointer to your original block of memory that is left untouched, not freed, and you have just created a memory leak.
Lastly, always verify your memory use with a memory error check program such as valgrind
on Linux. There are a number of subtle ways to misuse a block of memory.
Putting all that together, you could do something like the following. The code will read all lines from the filename provided as the first argument (or from stdin
if no filename is given):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
enum { MAXA = 128 }; /* initial allocation size, MAXA must be >= 1 */
int main (int argc, char **argv) {
char *line = NULL;
char **arr = NULL;
size_t i, maxa = MAXA, n = 0, ndx = 0;
ssize_t nchr = 0;
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
/* allocate MAXA pointers to char -- initially & validate */
if (!(arr = calloc (maxa, sizeof *arr))) {
fprintf (stderr, "error: virtual memory exhausted.\n");
return 1;
}
while ((nchr = getline (&line, &n, fp)) != -1) { /* read each line */
while (line[nchr-1] == '\n') line[--nchr] = 0; /* remove '\n' */
if (!(arr[ndx] = strdup (line))) { /* allocate, copy, add to arr */
fprintf (stderr, "error: virtual memory exhausted.\n");
break; /* leave read loop, preserving existing arr */
}
if (++ndx == maxa) { /* if allocation limit reached, realloc arr */
size_t asz = sizeof *arr;
void *tmp = realloc (arr, (maxa + MAXA) * asz);
if (!tmp) { /* validate realloc succeeded */
fprintf (stderr, "error: realloc, memory exhausted.\n");
break; /* preserving original arr */
}
arr = tmp; /* assign & zero (optional) new memory */
memset (arr + (maxa + MAXA) * asz, 0, MAXA * asz);
maxa += MAXA; /* update current allocation limit */
}
}
if (fp != stdin) fclose (fp); /* close file if not stdin */
if (line) free (line); /* free mem allocated by getline */
for (i = 0; i < ndx; i++) /* output array */
printf (" arr[%4zu] : %s\n", i, arr[i]);
for (i = 0; i < ndx; i++) /* free allocated memory */
free (arr[i]); /* free each line */
free (arr); /* free pointers */
return 0;
}
Example Use/Output
$ ./bin/getline_realloc_arr < dat/words_554.txt
arr[ 0] : Aam
arr[ 1] : Aard-vark
arr[ 2] : Aard-wolf
arr[ 3] : Aaronic
...
arr[ 549] : Accompaniment
arr[ 550] : Accompanist
arr[ 551] : Accompany
arr[ 552] : Accompletive
arr[ 553] : Accomplice
Look things over and let me know if you have any questions.