I get error when I try to read csv file and store the information in struct in C

Question

If the program does not read 4 values, it should show the error message, but my file has 4 values and even if I change the value in value to 2, 3, or 5, I am getting the same output.

My output for this program is:

File format incorrect.

However, when I change from read == 4 and read != 4 to read == 1 and read != 1 , my output is:

8 records read.
Iskandar  0.000000 0.000000 0.000000
Kholmatov,100,100,100 0 -0.001162 0.000000 0.000000
George  0.000000 0.000000 20.625134
Washington,90,50,100  -0.001162 0.000000 0.000000
Dennis  0.000000 0.000000 0.000000
Ritchie,90,0,10  0.000000 0.000000 0.000000
Bill  0.000000 0.000000 0.000000
Gates,60,50,77  0.000000 0.000000 -0.001162`

My data.csv file:

Iskandar Kholmatov,100,100,100
George Washington,90,50,100
Dennis Ritchie,90,0,10
Bill Gates,60,50,77

My program:

#include <stdio.h>

// struct to hold the name of a student
struct name
{
  char first[20]; // string to hold the first name
  char last[20]; // string to hold the last name
};

// struct to hold the grades of a student
struct student
{
  struct name Name; // name struct from above
  float grades[3]; // array to hold 3 grades
  float average; // float to hold the average of 3 grades above
};

int main(void)
{
  // file pointer variable for accessing the file
  FILE *file;

  // attempt to open file.txt in read mode to read the file contents
  file = fopen("data.csv", "r");

  // if the file failed to open, exit with an error message and status
  if (file == NULL)
  {
    printf("Error opening file.\n");
    return 1;
  }

  struct student students[5];

  int read = 0;

  // records will keep track of the number of Student records read from the file
  int records = 0;

  // read all records from the file and store them into the students array
  do
  {
    read = fscanf(file, "%s,%s,%f,%f,%f\n",
           students[records].Name.first,
           students[records].Name.last,
           &students[records].grades[0],
           &students[records].grades[1],
           &students[records].grades[2]);


    // if fscanf read 4 values from the file then we've successfully read
    // in another record
    if (read == 4)
      records++;

    // The only time that fscanf should NOT read 4 values from the file is
    // when we've reached the end of the file, so if fscanf did not read in
    // exactly 4 values and we're not at the end of the file, there has been
    // an error (likely due to an incorrect file format) and so we exit with
    // an error message and status.
    if (read != 4 && !feof(file))
    {
      printf("File format incorrect.\n");
      return 1;
    }

    // if there was an error reading from the file exit with an error message
    // and status
    if (ferror(file))
    {
      printf("Error reading file.\n");
      return 1;
    }

  } while (!feof(file));

  // close the file as we are done working with it
  fclose(file);

  // print out the number of records read
  printf("\n%d records read.\n\n", records);

  // print out each of the records that was read
  for (int i = 0; i < records; i++)
    printf("%s %s %f %f %f\n",
           students[i].Name.first,
           students[i].Name.last,
           students[i].grades[0],
           students[i].grades[1],
           students[i].grades[2]);
  printf("\n");

  return 0;
}

My expected output is just the information that is in the .csv file.

the 'error message' is your message. What did you see when you stepped through the code in th edebugger what did read equal? — pm100, Dec 10 '22 at 00:27
There's no comma between first and last in your data file but there is in your format string. — Retired Ninja, Dec 10 '22 at 00:36
`scanf()` and relatives are deceptive (for example, `"%s"` will fail when starting at a space). You'll probably just have to sit with the description of the pattern language and try variations hardcoded in a test program until you get what you want (e.g. try `" %s"`, `"%[^,]"`, etc.). — John Bayko, Dec 10 '22 at 00:56
See [What is the effect of trailing white space in a `scanf()` format string?](https://stackoverflow.com/q/19499060/15168) Where you are reading from a file, as in your code, it isn't quite as serious as if you are reading from the user's typing at the terminal — but when the input is from the terminal, trailing white space in a format string is a catastrophic UI/UX blunder. — Jonathan Leffler, Dec 10 '22 at 01:16
suggestion - write a simpler program that just scanfs the first line and prints out the number read and the fields read. Do this till you get it to read cleanly — pm100, Dec 10 '22 at 02:13

score 0 · Answer 1 · answered Dec 12 '22 at 00:28

Reading CSV (Comma-Separated Value) files is hard for the general case, where fields can be embedded in double quotes and can then contain commas and doubled-up double quotes to embed a double quote, and where a single field can extend over multiple lines.

In your data, you don't have to worry about those special cases. Instead, you've imposed an inconsistency because you split the name field into two based on the space separating them. As long as you don't have "Alice Betty Clarke" as a name in the data, you can still do it.

You attempt to use:

    read = fscanf(file, "%s,%s,%f,%f,%f\n",
           students[records].Name.first,
           students[records].Name.last,
           &students[records].grades[0],
           &students[records].grades[1],
           &students[records].grades[2]);

This alone has multiple problems:

You attempt to read the names separated by a comma, but they're separated by a space.
You put a newline (white space) at the end of the format string.
The second %s will read up to white space, which means it will gobble up the comma and the numbers.
You don't prevent buffer overflows from overlong names.

The solutions to these problems are:

This is easily fixed — replace the first comma in the format string with a blank (or omit it altogether: "%s%s" reads two words separated by white space).
See What is the effect of trailing white space in a scanf() format string? Where you are reading from a file, as in your code, it isn't quite as serious as if you are reading from the user's typing at the terminal — but when the input is from the terminal, trailing white space in a format string is a catastrophic UI/UX blunder. The fix is trivial — omit the \n from the format string. The next call will skip leading white space, including newlines left over from the prior call.
Use a negated scan set: %[^,]. You could use that in place of the first field for simplicity and consistency.
Limit the length of the inputs: "%19[^, ] %19[^, ],%f,%f,%f". Note that there are three conversion specifiers that do not skip leading white space, and they are %c, %[…] (scan sets) and %n. When using the scan sets, it is necessary to include the white space between the conversion specifications.

You have experimented with various values for your:

    if (read == 4)
      records++;

Since you are attempting to read 5 values, you should test for 5; if you don't get 5, there is either EOF (return value EOF), some sort of encoding error (unlikely, but the return value would also be EOF), or a data format error (the return value is in the range 0..4). You should exit the loop on receiving EOF. With a data format error, if you want to continue, you should probably read and ignore data up to the next newline:

int c;
while ((c = getchar()) != EOF && c != '\n')
    ;

It may be more sensible to abandon ship immediately. Alternatively, count the number of erroneous records, read the rest of the file so further erroneous records can be reported, and probably abandon further processing after EOF is finally detected.

You should ensure that you don't try to read more records than will fit in the array.

You can improve the error reporting by reading whole lines using fgets() or POSIX getline() and then passing the line to sscanf(). Note that if you do this, you might want to check for garbage after the third number, probably using the %n conversion specification to identify where the conversions stopped and ensuring that there are no non-blank characters after the number. The scanf() family of functions do not count the %n conversions in the return value.

Note that error messages should be written to stderr, not to stdout. Also, you should not call a function that opens a file (such as fopen() or open()) with a string literal for the file name. You must check that the open succeeded, and if not, report the error (on standard error - stderr) and you should include the file name in the error message. To avoid repetition, you should pass a variable that points to the file name to the open function, and can then use that variable when formatting the error message too. You can use perror() to report the problem if you don't have a better mechanism. For example:

const char *filename = "data.csv";
FILE *fp = fopen(filename, "r");
if (fp == NULL)
{
    perror(filename);
    exit(EXIT_FAILURE);
}

Putting all these changes and refinements together, you might end up with code like this:

#include <ctype.h>
#include <stdio.h>
#include <string.h>

struct Name
{
    char first[20];
    char last[20];
};

struct Student
{
    struct Name name;
    float grades[3];
    float average;
};

static int trailing_white_space_only(const char *buffer)
{
    unsigned char *data = (unsigned char *)buffer;
    while (*data != '\0' && isspace(*data))
        data++;
    return *data == '\0';
}

int main(void)
{
    const char *filename = "data.csv";
    FILE *fp = fopen("data.csv", "r");

    if (fp == NULL)
    {
        fprintf(stderr, "Error opening file '%s' for reading\n", filename);
        return 1;
    }

    enum { MAX_STUDENTS = 5 };
    struct Student students[MAX_STUDENTS];

    int n_fields = 0;
    int records = 0;
    int lineno = 0;
    int fail = 0;

    char buffer[2048];
    while (records < MAX_STUDENTS && fgets(buffer, sizeof(buffer), fp) != NULL)
    {
        buffer[strcspn(buffer, "\n")] = '\0';
        lineno++;
        int offset = 0;
        n_fields = sscanf(buffer, "%19[^, ] %19[^, ],%f,%f,%f%n",
                          students[records].name.first,
                          students[records].name.last,
                          &students[records].grades[0],
                          &students[records].grades[1],
                          &students[records].grades[2],
                          &offset);

        if (n_fields == 5)
        {
            if (trailing_white_space_only(&buffer[offset]))
                records++;
            else
            {
                fprintf(stderr, "Trailing junk on line %d\n    [%s]\n",
                        lineno, buffer);
                fail++;
            }
        }
        else
        {
            fail++;
            fprintf(stderr, "Format error on line %d (field %d)\n    [%s]\n",
                    lineno, n_fields + 1, buffer);
        }
    }

    fclose(fp);

    if (fail == 0)
        printf("\n%d records read successfully.\n\n", records);
    else
        printf("\n%d records read successfully (and %d invalid records "
               "were discarded).\n\n", records, fail);

    for (int i = 0; i < records; i++)
    {
        char name[sizeof(struct Name)];
        snprintf(name, sizeof(name), "%.19s %.19s",
                 students[i].name.first, students[i].name.last);
        printf("%-39s %6.2f %6.2f %6.2f\n", name,
               students[i].grades[0],
               students[i].grades[1],
               students[i].grades[2]);
    }
    printf("\n");

    return 0;
}

With the data file data.csv from the question, the output is:

4 records read successfully.

Iskandar Kholmatov                      100.00 100.00 100.00
George Washington                        90.00  50.00 100.00
Dennis Ritchie                           90.00   0.00  10.00
Bill Gates                               60.00  50.00  77.00

Now consider this variant data file, which has bad data on lines 3, 5 and 6:

Iskandar Kholmatov,100,100,100
George Washington,90,50,100
Garbage Disposal,read,me,a,riddle
Dennis Ritchie,90,0,10
Steve Jobs,60,70,80,
Betty Alice Clarke,94,95,97
Bill Gates,60,50,77

The output is:

Format error on line 3 (field 3)
    [Garbage Disposal,read,me,a,riddle]
Trailing junk on line 5
    [Steve Jobs,60,70,80,]
Format error on line 6 (field 3)
    [Betty Alice Clarke,94,95,97]

4 records read successfully (and 3 invalid records were discarded).

Iskandar Kholmatov                      100.00 100.00 100.00
George Washington                        90.00  50.00 100.00
Dennis Ritchie                           90.00   0.00  10.00
Bill Gates                               60.00  50.00  77.00

There are still many ways you might improve the program. For example, if there are more records in the file than fit in the array, you could read and diagnose the excess records (reporting errors too). Or you could revise the code to dynamically allocate the array of students and grow the array when necessary.

I get error when I try to read csv file and store the information in struct in C

1 Answers1