0

Having trouble reading the combination of integers, strings, and real numbers using fscanf. I admit that I am a novice programmer in C, yet I don't see why my code is not working properly.

The contents of sourcefile.txt, the file used by fscanf:

222 MSLET[Pa] 0-MSL 200507011200 200507021200 101226.063
223 MSLET[Pa] 0-MSL 200507011200 200507021200 9999.000
224 MSLET[Pa] 0-MSL 200507011200 200507021200 101217.063
222 PRMSL[Pa] 0-MSL 200507011200 200507021200 101226.063
223 PRMSL[Pa] 0-MSL 200507011200 200507021200 9999.000

My c code is as follows:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <math.h>

int main (void)
{
    FILE *input;
    input = fopen("C:/sourcefile.txt", "r"); 
    char var[30], level[30];
    int loc, datecycle, datevalid;
    float value;
                                                             
    while (fscanf(input,"%d %[^ ] %[^ ] %d %d %f", &loc, var, level,         
&datecycle, &datevalid, &value) == 6) {
        fscanf(input,"%d %[^ ] %[^ ] %d %d %f", &loc, var, level, &datecycle,  
&datevalid, &value);
        printf("%d %s %s %d %d %f\n", loc, var, level, datecycle,
datevalid,value);
    }                                                                                                       
                                              
    fclose(input);
    return 0;               
}

Output from C code:

223 MSLET[Pa] 0-MSL -1356451712 -1356441712 9999.000
222 PRMSL[Pa] 0-MSL -1356451712 -1356441712 101226.063
223 PRMSL[Pa] 0-MSL -1356451712 -1356441712 9999.000

Issue #1

  1. Only 3 of the 5 lines were read. I don't understand why.

  2. The printf output from datecycle and datevalid are not the same as the input. I don't understand why.

Issue #2

With respect to the string entries in column 2 (e.g. MSLET[Pa]), instead of using [^ ] to read in the string (read until I encounter a space), I may want to read until I encounter the "]" (e.g. the "]" in MSLET[Pa]). My understanding is that I would write [^]]. Is that correct?

Any help that can be provided would be greatly appreciated.

Robert
  • 7,394
  • 40
  • 45
  • 64
C Novice
  • 11
  • 1
  • You should explain "not working properly" (preferably showing the output you get and explaining how it differs to what you expected) – M.M Apr 25 '18 at 23:11
  • 1
    The values you read into `datecycle` and `datevalid` likely exceed the maximum size of an `int`. And you discard every second line. – M.M Apr 25 '18 at 23:12
  • To clarify, I was trying to output the input exactly, using fscanf and printf. By changing the while statement to "while (!feof(input)) {", I was able to read all lines. By replacing "float value;" with "double value;" and replacing specifier "%f" with "%lf", I was able to read and output the correct values (left of the decimal point) of datecycle and datevalid. Thanks M.M for your help. – C Novice Apr 27 '18 at 20:49
  • [while(!feof(f)) is alwasy wrong](https://stackoverflow.com/q/5431941/6699433) – klutt Jun 22 '21 at 19:22
  • And please format the post properly. You are allowed to have more than one code block, and you must realize that Having the issue descriptions in a code block doesn't make sense. Especially not in the same block as the code. Also fix indentation. – klutt Jun 22 '21 at 19:26
  • You will want to read `datacycle` and `datevalid` as strings, store as strings and when the individual year, month, day, time is needed, parse the value from the stored strings. Currently, the values exceed the size storable as `int`. If you increased the storage to an 8-byte value, storage as a number would be possible, but later conversion to year, month, day time would be much more difficult. – David C. Rankin Jul 10 '21 at 00:54

1 Answers1

0

While an older question, it is deserving of an answer. The reason you are not obtaining valid data for datacycle and datevalid is because the input you are attempting to read using a conversion to int exceeds the value that is representable as an int. For example 200507011200 exceeds the maximum value represented by a signed integer by two orders of magnitude.

However, simply using a larger storage type to accommodate the values will lead to difficulty separating the year, month, date, time components of the strings later. A better approach is to read datacycle and datevalid as strings. You can later convert to a numeric value if needed, or more likely separate into year, month, day, time.

Additionally, reading with fscanf() directly is a fragile approach to reading your input file. A single error in the format of one line will corrupts the read of all data from that point forward.

Instead, reading each line into a buffer (character array) and then parsing into separate values with sscanf() decouples the read and conversion allowing each line to be read completely. In that case, if a line contains in invalid character, etc... only the parse of values from that one line will fail and all remaining lines can be read correctly.

Don't use MagicNumbers or hardcode filenames. Instead, either provide the filename as the first argument to your program (that's what int argc, char **argv arguments to main() are for), or prompt the user and take the filename as input. You should not have to recompile your code simply to read data from a different filename. 30 in your code is a MagicNumber. If you need a constant, #define one or use a global enum. For example:

#define MAXC 1024       /* if you need a constant, #define on (or more) */
#define MAXVL  32
#define NELEM  16

Where MAXC is the maximum number of characters to read from each line (e.g. the size of your character array to hold the line), MAXVL for the size of var and level and NELEM the number of elements in your array to hold all values and a general constant of 16 to also use for the string size of datacycle and datevalid storage.

To hold each line of data so that it is available for processing by your program, create a struct that holds the values for loc, var, level, datacycle, datevalid and value, and simply declare an array of struct. That way when you read and convert each line, you can store the values in an array for use throughout your program, e.g.

typedef struct {        /* store datacycle & datevalid as strings */
    char var[MAXVL], level[MAXVL], datacycle[NELEM], datevalid[NELEM];
    int loc;
    float value;
} mydata_type;

Taking the filename to read as the first argument to your program (or read from stdin by default if no value is given), you can do the following:

int main (int argc, char **argv) {
    
    char buf[MAXC];                             /* buffer to hold each line */
    size_t n = 0;                               /* array element counter */
    mydata_type arr[NELEM] = {{ .var = "" }};   /* array of struct */
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

You can read and store each line until your array is full or you have reached the end of file by looping and reading each line into buf and then separating the values with sscanf() as discussed above, e.g.

    /* while array not full and line read */
    while (n < NELEM && fgets (buf, MAXC, fp)) {
        mydata_type tmp;    /* temporary structure to parse data into */
        /* parse data from buf into temporary struct and VALIDATE */
        if (sscanf (buf, "%d %s %s %s %s %f", &tmp.loc, tmp.var, tmp.level,
                    tmp.datacycle, tmp.datevalid, &tmp.value) == 6) {
            arr[n] = tmp;   /* on success, assign to array element */
            n += 1;         /* increment element counter */
        }
        else {
            fputs ("error: invalid line format.\n", stderr);
        }
    }

Close the file when you are done and then use the data a needed in your program. By way of example, you can output each of the values read from each line while converting datacycle and datevalid into year, month, day and time using sscanf() as well. An example would be:

void prn_mydata (mydata_type *arr, size_t n)
{
    for (size_t i = 0; i < n; i++) {
        int mc, dc, yc, tc,     /* integer values for datacycle components */
            mv, dv, yv, tv;     /* integer values for datevalid components */
        
        /* parse string values for datacycle & datevalid into components */
        if (sscanf (arr[i].datacycle, "%4d%2d%2d%4d", &yc, &mc, &dc, &tc) != 4)
            return;
        if (sscanf (arr[i].datevalid, "%4d%2d%2d%4d", &yv, &mv, &dv, &tv) != 4)
            return;
        
        /* output results */
        printf ("\n%d\n%s\n%s\n%s  %d-%02d-%02d:%d\n"
                "%s  %d-%02d-%02d:%d\n%.3f\n", 
                arr[i].loc, arr[i].var, arr[i].level,
                arr[i].datacycle, yc, mc, dc, tc, 
                arr[i].datevalid, yv, mv, dv, tv, 
                arr[i].value);
    }
}

Putting it altogether into a sample program you would have:

#include <stdio.h>
#include <string.h>

#define MAXC 1024       /* if you need a constant, #define on (or more) */
#define MAXVL  32
#define NELEM  16

typedef struct {        /* store datacycle & datevalid as strings */
    char var[MAXVL], level[MAXVL], datacycle[NELEM], datevalid[NELEM];
    int loc;
    float value;
} mydata_type;

void prn_mydata (mydata_type *arr, size_t n)
{
    for (size_t i = 0; i < n; i++) {
        int mc, dc, yc, tc,     /* integer values for datacycle components */
            mv, dv, yv, tv;     /* integer values for datevalid components */
        
        /* parse string values for datacycle & datevalid into components */
        if (sscanf (arr[i].datacycle, "%4d%2d%2d%4d", &yc, &mc, &dc, &tc) != 4)
            return;
        if (sscanf (arr[i].datevalid, "%4d%2d%2d%4d", &yv, &mv, &dv, &tv) != 4)
            return;
        
        /* output results */
        printf ("\n%d\n%s\n%s\n%s  %d-%02d-%02d:%d\n"
                "%s  %d-%02d-%02d:%d\n%.3f\n", 
                arr[i].loc, arr[i].var, arr[i].level,
                arr[i].datacycle, yc, mc, dc, tc, 
                arr[i].datevalid, yv, mv, dv, tv, 
                arr[i].value);
    }
}

int main (int argc, char **argv) {
    
    char buf[MAXC];                             /* buffer to hold each line */
    size_t n = 0;                               /* array element counter */
    mydata_type arr[NELEM] = {{ .var = "" }};   /* array of struct */
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }
    
    /* while array not full and line read */
    while (n < NELEM && fgets (buf, MAXC, fp)) {
        mydata_type tmp;    /* temporary structure to parse data into */
        /* parse data from buf into temporary struct and VALIDATE */
        if (sscanf (buf, "%d %s %s %s %s %f", &tmp.loc, tmp.var, tmp.level,
                    tmp.datacycle, tmp.datevalid, &tmp.value) == 6) {
            arr[n] = tmp;   /* on success, assign to array element */
            n += 1;         /* increment element counter */
        }
        else {
            fputs ("error: invalid line format.\n", stderr);
        }
    }
    
    if (fp != stdin)        /* close file if not stdin */
        fclose (fp);
    
    prn_mydata (arr, n);    /* print the results */
}

Example Use/Output

With your example data in dat/sourcefile.txt, you would use the program and receive the following output where each component of the line is printed on a separate line as part of a group:

$ ./bin/read_sourcefile dat/sourcefile.txt

222
MSLET[Pa]
0-MSL
200507011200  2005-07-01:1200
200507021200  2005-07-02:1200
101226.062

223
MSLET[Pa]
0-MSL
200507011200  2005-07-01:1200
200507021200  2005-07-02:1200
9999.000

224
MSLET[Pa]
0-MSL
200507011200  2005-07-01:1200
200507021200  2005-07-02:1200
101217.062

222
PRMSL[Pa]
0-MSL
200507011200  2005-07-01:1200
200507021200  2005-07-02:1200
101226.062

223
PRMSL[Pa]
0-MSL
200507011200  2005-07-01:1200
200507021200  2005-07-02:1200
9999.000

Look things over and let me know if you have further questions.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85