There is nothing difficult about reading an unknown number of double values from a file or stdin
and storing them in a simulated 2D array. (pointer-to-pointer-to-type) Since you have to assume the number of columns may also differ per-row, you need a similar way to allocate column storage, keep track of the number of values allocated/read, and a way to reallocate the column storage if/when the maximum number of columns are reached. This allows handling a jagged array as easily as an array with a fixed size of columns.
There is one subtle trick that greatly helps in managing jagged arrays. Since you do not know before hand how many column values may be present -- once read, you need a way to store the number of column elements present (for each row in the array). A simple and robust method is simply to store the number of column elements per-row as the first column value. Then after the data is collected, you have the information as part of the array that provides a key to iterating over all rows and columns in the array.
Included as part of this approach, I have created specialty functions xstrtod
, xcalloc
, xrealloc_sp
(realloc of single-pointer array) and realloc_dp
(realloc for double-pointer). These are nothing more than the standard functions with appropriate error-checking moved to the function so the myriad of validation checks don't cloud the main body of the code.
A quick implementation that reads values from stdin
could be coded as follows:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <errno.h>
#include <math.h> /* for HUGE_VALF, HUGE_VALL */
#define ROWS 32
#define COLS 32
#define MAXC 256
double xstrtod (char *str, char **ep);
void *xcalloc (size_t n, size_t s);
void *xrealloc_sp (void *p, size_t sz, size_t *n);
void *xrealloc_dp (void **p, size_t *n);
int main (void) {
char line[MAXC] = {0}; /* line buffer for fgets */
char *p, *ep; /* pointers for strtod */
double **array = NULL; /* array of values */
size_t row = 0, col = 0, nrows = 0; /* indexes, number of rows */
size_t rmax = ROWS, cmax = COLS; /* row/col allocation size */
/* allocate ROWS number of pointers to array of double */
array = xcalloc (ROWS, sizeof *array);
/* read each line in file */
while (fgets(line, MAXC, stdin))
{
p = ep = line; /* initize pointer/end pointer */
col = 1; /* start col at 1, store ncols in 0 */
cmax = COLS; /* reset cmax for each row */
/* allocate COLS number of double for each row */
array[row] = xcalloc (COLS, sizeof **array);
/* convert each string of digits to number */
while (errno == 0)
{
array[row][col++] = xstrtod (p, &ep);
if (col == cmax) /* if cmax reached, realloc array[row] */
array[row] = xrealloc_sp (array[row], sizeof *array[row], &cmax);
/* skip delimiters/move pointer to next digit */
while (*ep && *ep != '-' && (*ep < '0' || *ep > '9')) ep++;
if (*ep)
p = ep;
else /* break if end of string */
break;
}
array[row++][0] = col; /* store ncols in array[row][0] */
/* realloc rows if needed */
if (row == rmax) array = xrealloc_dp ((void **)array, &rmax);
}
nrows = row; /* set nrows to final number of rows */
printf ("\n the simulated 2D array elements are:\n\n");
for (row = 0; row < nrows; row++) {
for (col = 1; col < (size_t)array[row][0]; col++)
printf (" %8.2lf", array[row][col]);
putchar ('\n');
}
putchar ('\n');
/* free all allocated memory */
for (row = 0; row < nrows; row++)
free (array[row]);
free (array);
return 0;
}
/** string to double with error checking.
* #include <math.h> for HUGE_VALF, HUGE_VALL
*/
double xstrtod (char *str, char **ep)
{
errno = 0;
double val = strtod (str, ep);
/* Check for various possible errors */
if ((errno == ERANGE && (val == HUGE_VAL || val == HUGE_VALL)) ||
(errno != 0 && val == 0)) {
perror ("strtod");
exit (EXIT_FAILURE);
}
if (*ep == str) {
fprintf (stderr, "No digits were found\n");
exit (EXIT_FAILURE);
}
return val;
}
/** xcalloc allocates memory using calloc and validates the return.
* xcalloc allocates memory and reports an error if the value is
* null, returning a memory address only if the value is nonzero
* freeing the caller of validating within the body of code.
*/
void *xcalloc (size_t n, size_t s)
{
register void *memptr = calloc (n, s);
if (memptr == 0)
{
fprintf (stderr, "%s() error: virtual memory exhausted.\n", __func__);
exit (EXIT_FAILURE);
}
return memptr;
}
/** reallocate array of type size 'sz', to 2 * 'n'.
* accepts any pointer p, with current allocation 'n',
* with the type size 'sz' and reallocates memory to
* 2 * 'n', updating the value of 'n' and returning a
* pointer to the newly allocated block of memory on
* success, exits otherwise. all new memory is
* initialized to '0' with memset.
*/
void *xrealloc_sp (void *p, size_t sz, size_t *n)
{
void *tmp = realloc (p, 2 * *n * sz);
#ifdef DEBUG
printf ("\n reallocating '%zu' to '%zu', size '%zu'\n", *n, *n * 2, sz);
#endif
if (!tmp) {
fprintf (stderr, "%s() error: virtual memory exhausted.\n", __func__);
exit (EXIT_FAILURE);
}
p = tmp;
memset (p + *n * sz, 0, *n * sz); /* zero new memory */
*n *= 2;
return p;
}
/** reallocate memory for array of pointers to 2 * 'n'.
* accepts any pointer 'p', with current allocation of,
* 'n' pointers and reallocates to 2 * 'n' pointers
* intializing the new pointers to NULL and returning
* a pointer to the newly allocated block of memory on
* success, exits otherwise.
*/
void *xrealloc_dp (void **p, size_t *n)
{
void *tmp = realloc (p, 2 * *n * sizeof tmp);
#ifdef DEBUG
printf ("\n reallocating %zu to %zu\n", *n, *n * 2);
#endif
if (!tmp) {
fprintf (stderr, "%s() error: virtual memory exhausted.\n", __func__);
exit (EXIT_FAILURE);
}
p = tmp;
memset (p + *n, 0, *n * sizeof tmp); /* set new pointers NULL */
*n *= 2;
return p;
}
Compile
gcc -Wall -Wextra -Ofast -o bin/fgets_strtod_dyn fgets_strtod_dyn.c
Input
$ cat dat/float_4col.txt
2078.62 5.69982 -0.17815 -0.04732
5234.95 8.40361 0.04028 0.10852
2143.66 5.35245 0.10747 -0.11584
7216.99 2.93732 -0.18327 -0.20545
1687.24 3.37211 0.14195 -0.14865
2065.23 34.0188 0.1828 0.21199
2664.57 2.91035 0.19513 0.35112
7815.15 9.48227 -0.11522 0.19523
5166.16 5.12382 -0.29997 -0.40592
6777.11 5.53529 -0.37287 -0.43299
4596.48 1.51918 -0.33986 0.09597
6720.56 15.4161 -0.00158 -0.0433
2652.65 5.51849 0.41896 -0.61039
Output
$ ./bin/fgets_strtod_dyn <dat/float_4col.txt
the simulated 2D array elements are:
2078.62 5.70 -0.18 -0.05
5234.95 8.40 0.04 0.11
2143.66 5.35 0.11 -0.12
7216.99 2.94 -0.18 -0.21
1687.24 3.37 0.14 -0.15
2065.23 34.02 0.18 0.21
2664.57 2.91 0.20 0.35
7815.15 9.48 -0.12 0.20
5166.16 5.12 -0.30 -0.41
6777.11 5.54 -0.37 -0.43
4596.48 1.52 -0.34 0.10
6720.56 15.42 -0.00 -0.04
2652.65 5.52 0.42 -0.61
Memory Check
In any code your write that dynamically allocates memory, it is imperative that you use a memory error checking program to insure you haven't written beyond/outside your allocated block of memory and to confirm that you have freed all the memory you have allocated. For Linux valgrind
is the normal choice. There are so many subtle ways to misuse a block of memory that can cause real problems, there is no excuse not to do it. There are similar memory checkers for every platform. They are all simple to use. Just run your program through it.
$ valgrind ./bin/fgets_strtod_dyn <dat/float_4col.txt
==28022== Memcheck, a memory error detector
==28022== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==28022== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==28022== Command: ./bin/fgets_strtod_dyn
==28022==
the simulated 2D array elements are:
2078.62 5.70 -0.18 -0.05
5234.95 8.40 0.04 0.11
2143.66 5.35 0.11 -0.12
7216.99 2.94 -0.18 -0.21
1687.24 3.37 0.14 -0.15
2065.23 34.02 0.18 0.21
2664.57 2.91 0.20 0.35
7815.15 9.48 -0.12 0.20
5166.16 5.12 -0.30 -0.41
6777.11 5.54 -0.37 -0.43
4596.48 1.52 -0.34 0.10
6720.56 15.42 -0.00 -0.04
2652.65 5.52 0.42 -0.61
==28022==
==28022== HEAP SUMMARY:
==28022== in use at exit: 0 bytes in 0 blocks
==28022== total heap usage: 14 allocs, 14 frees, 3,584 bytes allocated
==28022==
==28022== All heap blocks were freed -- no leaks are possible
==28022==
==28022== For counts of detected and suppressed errors, rerun with: -v
==28022== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
There is nothing difficult about reading an unknown number of rows
and unknown number of columns
from a file in C, but you must pay particular attention to how you do it. While you can limit the array to a square (NxN
) array, there is no reason every row can't have a different number of columns (a jagged-array).
Your basic approach is to allocate memory for an array or pointers to type double
for some reasonable anticipated number of rows. ( #define ROWS 32
) You will then read each line. For every line you read you then allocate a block of memory for an array of 'double' for some reasonably anticipated number of doubles. ( #define COLS 32
)
You then convert each string of digits encountered to an double value and store the number at array[row][col]
. (we actually start storing values at col = 1
and save col = 0
to hold the final number of cols for that row) You keep track of the number you have added to the array and if your number of columns reaches the number you allocated, you then realloc
the array to hold additional doubles.
You continue reading lines until you have read all the lines. If you reach your original limit on the number of rows, you simply realloc
the array much like you did with cols
.
You now have all your data stored and can do with it what you will. When you are done, do not forget to free
all memory you have allocated. Let me know if you have questions.
Quick Brown Fox Separated File
There is one further bit of additional robustness that you can build into the code that will basically allow you to read any row of data no matter how much junk may be included in the file. It doesn't matter if the row-values are comma separated, semi-colon separated, space separated, or separated by the quick brown fox. With a little parsing help, you can prevent read failures by manually advancing to the beginning of the next number. A quick addition in context would be:
while (errno == 0)
{
/* skip any non-digit characters */
while (*p && ((*p != '-' && (*p < '0' || *p > '9')) ||
(*p == '-' && (*(p+1) < '0' || *(p+1) > '9')))) p++;
if (!*p) break;
array[row][col++] = xstrtod (p, &ep);
...
Skipping the non-digits will allow you to read almost any sane file with any type of delimiter without issue. Take for example, the same numbers used originally, but now formatted as follows in the data file:
$ cat dat/float_4colmess.txt
The, 2078.62 quick 5.69982 brown -0.17815 fox; -0.04732 jumps
5234.95 over 8.40361 the 0.04028 lazy 0.10852 dog
and the 2143.66 dish ran 5.35245 away 0.10747 with -0.11584
the spoon, 7216.99 2.93732 -0.18327 -0.20545
1687.24 3.37211 0.14195 -0.14865
2065.23 34.0188 0.1828 0.21199
2664.57 2.91035 0.19513 0.35112
7815.15 9.48227 -0.11522 0.19523
5166.16 5.12382 -0.29997 -0.40592
6777.11 5.53529 -0.37287 -0.43299
4596.48 1.51918 -0.33986 0.09597
6720.56 15.4161 -0.00158 -0.0433
2652.65 5.51849 0.41896 -0.61039
Even with this insane format, the code has no problems properly reading all numeric values into the array properly.