0

I'm about to implement a dynamic matrix structure (that stores double values) and I got some problems with reading from a file.

The idea is, that the program doesn't know the number of rows and columns in advance. It has to scan the first line in order to find the number of columns.

The problem with simply using fscanf() to scan doubles is, that (as far as I know) it can't differentiate between the newline and space characters, so it would read the whole file as one line.

To fix this I first fscanf() the line character-by-character with a function. It stores the values in a string, that represents exactly one line.

Then I use sscanf() to scan the string for double values and store them in a double array. After the conversion I free the string. This is done in the chararray_to_doublearray function.

Now after a bit of testing I suspect that the chararray_to_doublearray function is not working as intended.

/* Converts a character array to a double array and returns a pointer to it. Frees the space of the character array, as it's no longer needed. */
double *chararray_to_doublearray(char **chararray)
{
    int i;
    int elements = 0;
    double *numbers=NULL;
    double newnumber;
    while (sscanf(*chararray, "%lf ", &newnumber) == 1) {
        double* newarray = (double*) malloc(sizeof(double) * (elements+1));
        for (i = 0; i < elements; ++i)
            newarray[i] = numbers[i];
        free(numbers);
        numbers = newarray;
        numbers[elements] = newnumber;
        ++elements;
    }
    free(*chararray);
    return numbers;
}

And the main() function calling only the chararray_to_doublearray function:

main ()
{
    int i;
    double *numbers;
    char string[50]="12.3 1.2 3.4 4 0.3";
    numbers=chararray_to_doublearray(&string);
    free(numbers)
    return 0;
}

So to summarize: I couldn't find any good implementation of reading double values from the user (or from a file) until the end of line. This is my implementation. Do you have any ideas, what might be wrong with this?

Regards,

naroslife

Iharob Al Asimi
  • 52,653
  • 6
  • 59
  • 97
naroslife
  • 15
  • 5
  • Is your main like that or does it return `int`? Also, do not cast `void *`, specifically you don't need `(double *) malloc()`, if you need the cast you are using the wrong language or, the wrong compiler. And `malloc()`ing and `free()`ing the array over and over is really bad, use a predefined value and also, use `realloc()` instead of freeing manually. Did you try `fgets()`? – Iharob Al Asimi Dec 01 '15 at 04:16
  • you're freeing a memory block that was not allocated by `malloc`. – user253751 Dec 01 '15 at 04:23
  • **[Try this](https://stackoverflow.com/a/28335093/3386109).** – user3386109 Dec 01 '15 at 04:34
  • You need to read the line into a buffer, and then use `sscanf()` to iterate over the line. See [How to use `sscanf()` in loops?](https://stackoverflow.com/questions/3975236/) — probably just one amongst many on the topic. – Jonathan Leffler Dec 01 '15 at 04:51
  • 1
    It would also be nice if chararray_to_doublearray() provided a mechanism for passing array size. Currently there is no way to tell how large the returned array is. There are several options there... You could figure array size by counting spaces in the first text line even before calling the function; or you could let the function figure the array size and pass it back. Since this is a matrix, there is no need to re-discover column count for each row. You could avoid repeated manual malloc/free() or even avoid realloc(). – Anatoli P Dec 01 '15 at 05:02
  • Another observation: the way the sample code is written, an incompatible pointer type is passed to the function via chararray, but I'm sure that's a result of simplifying the code for the question. Obviously in your actual code you have to allocate "string" dynamically. – Anatoli P Dec 01 '15 at 05:21
  • 1
    Sorry, missed an obvious thing: that while() is an infinite loop. It always scans from the beginning of the chararray. – Anatoli P Dec 01 '15 at 05:33

2 Answers2

0

This is an XY problem. Do you really need to "fscanf() the line character-by-character"? Has this caused you to ask too much of your question in the wrong direction?

Consider this: %lf denotes a conversion of characters to the double that you choose... It stops immediately when there are no more suitable characters to convert... and a newline is not a suitable character to convert... Is there a light bulb shining in your head, yet?

In your case, the space that follows %lf in the format string causes the useful information (whether the white-space is a newline or not) to be discarded. STOP! You've gone too far, and the consequence is that you now need an intermediate character array conversion function, which is unnecessary bloat.

With this new-found realisation that removing the white-space from the format string will cause a post-fixed newline to be left onto the stream, consider using fgetc to handle the distinction between regular white-space and newlines.

e.g.

double f;
int x = scanf("%lf", &f);
int c;
do {
    c = getchar();
} while (isspace(c) && c != '\n');
if (c != '\n') {
    ungetc(c, stdin);
}

See above, how I was able to distinguish between newline and non-newline white-space?

autistic
  • 1
  • 3
  • 35
  • 80
  • It is indeed an XY problem, my bad. As this implementation is the closest to my original idea and is actually much more resource-friendly than dealing with the character array, I'm currently working on implementing this method. Thanks! – naroslife Dec 02 '15 at 21:37
0

There is nothing difficult about reading an unknown number of double values from a file or stdin and storing them in a simulated 2D array. (pointer-to-pointer-to-type) Since you have to assume the number of columns may also differ per-row, you need a similar way to allocate column storage, keep track of the number of values allocated/read, and a way to reallocate the column storage if/when the maximum number of columns are reached. This allows handling a jagged array as easily as an array with a fixed size of columns.

There is one subtle trick that greatly helps in managing jagged arrays. Since you do not know before hand how many column values may be present -- once read, you need a way to store the number of column elements present (for each row in the array). A simple and robust method is simply to store the number of column elements per-row as the first column value. Then after the data is collected, you have the information as part of the array that provides a key to iterating over all rows and columns in the array.

Included as part of this approach, I have created specialty functions xstrtod, xcalloc, xrealloc_sp (realloc of single-pointer array) and realloc_dp (realloc for double-pointer). These are nothing more than the standard functions with appropriate error-checking moved to the function so the myriad of validation checks don't cloud the main body of the code.

A quick implementation that reads values from stdin could be coded as follows:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <errno.h>
#include <math.h>   /* for HUGE_VALF, HUGE_VALL */

#define ROWS 32
#define COLS 32
#define MAXC 256

double xstrtod (char *str, char **ep);
void *xcalloc (size_t n, size_t s);
void *xrealloc_sp (void *p, size_t sz, size_t *n);
void *xrealloc_dp (void **p, size_t *n);

int main (void) {

    char line[MAXC] = {0};              /* line buffer for fgets    */
    char *p, *ep;                       /* pointers for strtod      */
    double **array = NULL;              /* array of values          */
    size_t row = 0, col = 0, nrows = 0; /* indexes, number of rows  */
    size_t rmax = ROWS, cmax = COLS;    /* row/col allocation size  */

    /* allocate ROWS number of pointers to array of double */
    array = xcalloc (ROWS, sizeof *array);

    /* read each line in file */
    while (fgets(line, MAXC, stdin))
    {
        p = ep = line;  /* initize pointer/end pointer      */
        col = 1;        /* start col at 1, store ncols in 0 */
        cmax = COLS;    /* reset cmax for each row          */

        /* allocate COLS number of double for each row */
        array[row] = xcalloc (COLS, sizeof **array);

        /* convert each string of digits to number */
        while (errno == 0)
        {
            array[row][col++] = xstrtod (p, &ep);

            if (col == cmax) /* if cmax reached, realloc array[row] */
                array[row] = xrealloc_sp (array[row], sizeof *array[row], &cmax);

            /* skip delimiters/move pointer to next digit */
            while (*ep && *ep != '-' && (*ep < '0' || *ep > '9')) ep++;
            if (*ep)
                p = ep;
            else  /* break if end of string */
                break;
        }
        array[row++][0] = col; /* store ncols in array[row][0] */

        /* realloc rows if needed */
        if (row == rmax) array = xrealloc_dp ((void **)array, &rmax);
    }
    nrows = row;  /* set nrows to final number of rows */

    printf ("\n the simulated 2D array elements are:\n\n");
    for (row = 0; row < nrows; row++) {
        for (col = 1; col < (size_t)array[row][0]; col++)
            printf ("  %8.2lf", array[row][col]);
        putchar ('\n');
    }
    putchar ('\n');

    /* free all allocated memory */
    for (row = 0; row < nrows; row++)
        free (array[row]);
    free (array);

    return 0;
}

/** string to double with error checking.
 *  #include <math.h> for HUGE_VALF, HUGE_VALL
 */
double xstrtod (char *str, char **ep)
{
    errno = 0;

    double val = strtod (str, ep);

    /* Check for various possible errors */
    if ((errno == ERANGE && (val == HUGE_VAL || val == HUGE_VALL)) ||
        (errno != 0 && val == 0)) {
        perror ("strtod");
        exit (EXIT_FAILURE);
    }

    if (*ep == str) {
        fprintf (stderr, "No digits were found\n");
        exit (EXIT_FAILURE);
    }

    return val;
}

/** xcalloc allocates memory using calloc and validates the return.
 *  xcalloc allocates memory and reports an error if the value is
 *  null, returning a memory address only if the value is nonzero
 *  freeing the caller of validating within the body of code.
 */
void *xcalloc (size_t n, size_t s)
{
    register void *memptr = calloc (n, s);
    if (memptr == 0)
    {
        fprintf (stderr, "%s() error: virtual memory exhausted.\n", __func__);
        exit (EXIT_FAILURE);
    }

    return memptr;
}

/** reallocate array of type size 'sz', to 2 * 'n'.
 *  accepts any pointer p, with current allocation 'n',
 *  with the type size 'sz' and reallocates memory to
 *  2 * 'n', updating the value of 'n' and returning a
 *  pointer to the newly allocated block of memory on
 *  success, exits otherwise. all new memory is
 *  initialized to '0' with memset.
 */
void *xrealloc_sp (void *p, size_t sz, size_t *n)
{
    void *tmp = realloc (p, 2 * *n * sz);
#ifdef DEBUG
    printf ("\n  reallocating '%zu' to '%zu', size '%zu'\n", *n, *n * 2, sz);
#endif
    if (!tmp) {
        fprintf (stderr, "%s() error: virtual memory exhausted.\n", __func__);
        exit (EXIT_FAILURE);
    }
    p = tmp;
    memset (p + *n * sz, 0, *n * sz); /* zero new memory */
    *n *= 2;

    return p;
}

/** reallocate memory for array of pointers to 2 * 'n'.
 *  accepts any pointer 'p', with current allocation of,
 *  'n' pointers and reallocates to 2 * 'n' pointers
 *  intializing the new pointers to NULL and returning
 *  a pointer to the newly allocated block of memory on
 *  success, exits otherwise.
 */
void *xrealloc_dp (void **p, size_t *n)
{
    void *tmp = realloc (p, 2 * *n * sizeof tmp);
#ifdef DEBUG
    printf ("\n  reallocating %zu to %zu\n", *n, *n * 2);
#endif
    if (!tmp) {
        fprintf (stderr, "%s() error: virtual memory exhausted.\n", __func__);
        exit (EXIT_FAILURE);
    }
    p = tmp;
    memset (p + *n, 0, *n * sizeof tmp); /* set new pointers NULL */
    *n *= 2;

    return p;
}

Compile

gcc -Wall -Wextra -Ofast -o bin/fgets_strtod_dyn fgets_strtod_dyn.c

Input

$ cat dat/float_4col.txt
 2078.62        5.69982       -0.17815       -0.04732
 5234.95        8.40361        0.04028        0.10852
 2143.66        5.35245        0.10747       -0.11584
 7216.99        2.93732       -0.18327       -0.20545
 1687.24        3.37211        0.14195       -0.14865
 2065.23        34.0188         0.1828        0.21199
 2664.57        2.91035        0.19513        0.35112
 7815.15        9.48227       -0.11522        0.19523
 5166.16        5.12382       -0.29997       -0.40592
 6777.11        5.53529       -0.37287       -0.43299
 4596.48        1.51918       -0.33986        0.09597
 6720.56        15.4161       -0.00158        -0.0433
 2652.65        5.51849        0.41896       -0.61039

Output

$ ./bin/fgets_strtod_dyn <dat/float_4col.txt

 the simulated 2D array elements are:

   2078.62      5.70     -0.18     -0.05
   5234.95      8.40      0.04      0.11
   2143.66      5.35      0.11     -0.12
   7216.99      2.94     -0.18     -0.21
   1687.24      3.37      0.14     -0.15
   2065.23     34.02      0.18      0.21
   2664.57      2.91      0.20      0.35
   7815.15      9.48     -0.12      0.20
   5166.16      5.12     -0.30     -0.41
   6777.11      5.54     -0.37     -0.43
   4596.48      1.52     -0.34      0.10
   6720.56     15.42     -0.00     -0.04
   2652.65      5.52      0.42     -0.61

Memory Check

In any code your write that dynamically allocates memory, it is imperative that you use a memory error checking program to insure you haven't written beyond/outside your allocated block of memory and to confirm that you have freed all the memory you have allocated. For Linux valgrind is the normal choice. There are so many subtle ways to misuse a block of memory that can cause real problems, there is no excuse not to do it. There are similar memory checkers for every platform. They are all simple to use. Just run your program through it.

$ valgrind ./bin/fgets_strtod_dyn <dat/float_4col.txt
==28022== Memcheck, a memory error detector
==28022== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==28022== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==28022== Command: ./bin/fgets_strtod_dyn
==28022==

 the simulated 2D array elements are:

   2078.62      5.70     -0.18     -0.05
   5234.95      8.40      0.04      0.11
   2143.66      5.35      0.11     -0.12
   7216.99      2.94     -0.18     -0.21
   1687.24      3.37      0.14     -0.15
   2065.23     34.02      0.18      0.21
   2664.57      2.91      0.20      0.35
   7815.15      9.48     -0.12      0.20
   5166.16      5.12     -0.30     -0.41
   6777.11      5.54     -0.37     -0.43
   4596.48      1.52     -0.34      0.10
   6720.56     15.42     -0.00     -0.04
   2652.65      5.52      0.42     -0.61

==28022==
==28022== HEAP SUMMARY:
==28022==     in use at exit: 0 bytes in 0 blocks
==28022==   total heap usage: 14 allocs, 14 frees, 3,584 bytes allocated
==28022==
==28022== All heap blocks were freed -- no leaks are possible
==28022==
==28022== For counts of detected and suppressed errors, rerun with: -v
==28022== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)

There is nothing difficult about reading an unknown number of rows and unknown number of columns from a file in C, but you must pay particular attention to how you do it. While you can limit the array to a square (NxN) array, there is no reason every row can't have a different number of columns (a jagged-array).

Your basic approach is to allocate memory for an array or pointers to type double for some reasonable anticipated number of rows. ( #define ROWS 32 ) You will then read each line. For every line you read you then allocate a block of memory for an array of 'double' for some reasonably anticipated number of doubles. ( #define COLS 32 )

You then convert each string of digits encountered to an double value and store the number at array[row][col]. (we actually start storing values at col = 1 and save col = 0 to hold the final number of cols for that row) You keep track of the number you have added to the array and if your number of columns reaches the number you allocated, you then realloc the array to hold additional doubles.

You continue reading lines until you have read all the lines. If you reach your original limit on the number of rows, you simply realloc the array much like you did with cols.

You now have all your data stored and can do with it what you will. When you are done, do not forget to free all memory you have allocated. Let me know if you have questions.

Quick Brown Fox Separated File

There is one further bit of additional robustness that you can build into the code that will basically allow you to read any row of data no matter how much junk may be included in the file. It doesn't matter if the row-values are comma separated, semi-colon separated, space separated, or separated by the quick brown fox. With a little parsing help, you can prevent read failures by manually advancing to the beginning of the next number. A quick addition in context would be:

    while (errno == 0)
    {
        /* skip any non-digit characters */
        while (*p && ((*p != '-' && (*p < '0' || *p > '9')) ||
            (*p == '-' && (*(p+1) < '0' || *(p+1) > '9')))) p++;
        if (!*p) break;

        array[row][col++] = xstrtod (p, &ep);
        ...

Skipping the non-digits will allow you to read almost any sane file with any type of delimiter without issue. Take for example, the same numbers used originally, but now formatted as follows in the data file:

$ cat dat/float_4colmess.txt
The, 2078.62 quick  5.69982 brown -0.17815 fox;  -0.04732 jumps
 5234.95 over   8.40361 the    0.04028 lazy   0.10852 dog
and the  2143.66  dish ran      5.35245 away   0.10747  with -0.11584
the spoon, 7216.99        2.93732       -0.18327       -0.20545
 1687.24        3.37211        0.14195       -0.14865
 2065.23        34.0188         0.1828        0.21199
 2664.57        2.91035        0.19513        0.35112
 7815.15        9.48227       -0.11522        0.19523
 5166.16        5.12382       -0.29997       -0.40592
 6777.11        5.53529       -0.37287       -0.43299
 4596.48        1.51918       -0.33986        0.09597
 6720.56        15.4161       -0.00158        -0.0433
 2652.65        5.51849        0.41896       -0.61039

Even with this insane format, the code has no problems properly reading all numeric values into the array properly.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • Thank you for the detailed answer. This is a truly robust implementation, however I feel it's so much more above my skill level yet, that I wouldn't be comfortable using it, as it's a homework assignment I'm working on. But I saved it for future reference! – naroslife Dec 02 '15 at 21:35
  • Glad I could at least help a little. It will not be long before you are using this type implementation regularly. There is a bit of groundwork and basics to become familiar with before the C-lightbulb winks on, but once it does, there is no other language that offers the low-level control over computing than C (assembler excluded) Good luck. – David C. Rankin Dec 03 '15 at 07:24