0

I need to split a list of student like this into ID, Name and score. This is an exercise so no string is allowed, only char

0001 William Bob 8.5
0034 Howard Stark 9.5
0069 Natalia Long Young 8

Here's the code

int readFile(list& a) {
    char str[MAX];
    short i = 0;
    ifstream fi(IN);
    if (!fi)
        return 0;
    while (!fi.eof()) {
        fi.getline(str, MAX - 1);
        char* temp = strtok(str, " ");
        if (temp == NULL)
            continue;
        strcpy(a.sv[i].id, temp);
        temp = strtok(NULL, "0123456789");
        strcpy(a.sv[i].name, temp);
        temp = strtok(NULL, "\n");
        a.sv[i].grade = atof(temp);
        i++;
    }
    a.size = i;
    fi.close();
    return 1;
}

Using strtok() I have splitted ID and Name successfully, but the score are

0.5
0.5
0

I know the problem is because of temp = strtok(NULL, "0123456789"); but I don't know how to fix it, are there any delimiters beside "0123456789", or can I move the pointer back?


This is my attempt to fix the while(!file.eof()) and solve my problem. Here's my heading and structs:

#include<iostream>
#include<fstream>
#include<string>
#define IN "D:\\Input.txt"
#define OUT "D:\\Output.txt"
#define MAX 40
using namespace std;
struct sv{
    char id[MAX], name[MAX] , sex[MAX];
    float grade;
};
struct dssv {
    sv sv[MAX];
    short size;
};  

And here's my function:

int readFile(dssv& a) {
    char str[MAX];
    short i = 0;
    ifstream fi(IN);
    if (!fi)
        return 0;
    while (fi>>a.sv[i].id && fi.getline(str, MAX)) {
        char* name = strchr(str, ' ');
        int pos = strrchr(name, ' ') - name;
        char* score = str + pos;
        strcpy(name + pos, "\0"); \\null-terminate to remove the score.
        strcpy(a.sv[i].name, name + 1);
        a.sv[i].grade = atof(score + 1);
        i++;
    }
    a.size = i;
    fi.close();
    return 1;
}

Still figuring out how to fix the eof() and why do I need two pointers char* name and char* score instead of one and reuse it.

Phineas
  • 159
  • 2
  • 10
  • You definitely cannot move the pointer back. If you have to use `strtok`, I would split the string using `" "` as the delimiter. Use `strcat` to patch the name back together. – user3386109 Dec 29 '19 at 04:55
  • OTOH, if you aren't required to use `strtok`, then I would recommend `strcspn` to search for the first digit. `strcspn` doesn't modify the string like `strtok` does. So you can find the first digit, then back up, and insert the `'\0'` character yourself. – user3386109 Dec 29 '19 at 04:57
  • 5
    Recommended reading: [Why is iostream::eof inside a loop condition (i.e. `while (!stream.eof())`) considered wrong?](https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-i-e-while-stream-eof-cons) – user4581301 Dec 29 '19 at 05:02
  • 1
    In particular, you can't use `while (!fi.eof())` to predict that a future read won't fail. Status-reporting functions report on the past, they do not predict the future. – David Schwartz Dec 29 '19 at 06:52
  • `int pos = strrchr(name, ' ') - name;` in your update doesn't really provide a position, but instead provides the number of characters between the first and last space in the line. To bracket name you want to advance `name` to the first character in `name`, e.g. `while (name && isspace(*name)) name++;` You want to save the pointer from `strrchr` and after you compute `score` backup to the last char name nul-terminating spaces as you go, `char *end = strrchr(name, ' ');` and `while (end > name && --end && isspace(*end)) *end = 0;` Now just `strcpy(a.sv[i].name, name);`, no + 1, etc.. – David C. Rankin Dec 31 '19 at 07:55
  • Also, the reason you want to open your file in the parent function and pass an open `std::ifstream` reference as a parameter -- is -- if the file cannot be successfully opened for reading in the caller, there is no need to make the function call to begin with `:)` – David C. Rankin Dec 31 '19 at 07:58

1 Answers1

3

You have started off on the wrong foot. See Why is while ( !feof (file) ) always wrong?. While there are a number of ways to separate the information into id, name, score, probably the most basic is to simply read an entire line of data into a temporary buffer (character array), and then to use sscanf to separate id, name & score.

The parsing with sscanf is not difficult, the only caveat being that your name can contain whitespace, so you cannot simply use "%s" as the format specifier to extract the name. This is mitigated by your score field always starting with a digit and digits do not occur in names (there are always exceptions to the rule -- and it can be handled with a simple parse with a pair of pointers, but for the basic example we will make this formatting assumption)

To make data handling simpler and be able to coordinate all the information for one student as a single object (allowing you to create an array of them to hold all student information) you can use a simple stuct. Declaring a few constants to set the sizes for everything avoids using Magic-Numbers throughout your code. (though for the sscanf field-width modifiers, actual numbers must be used as you cannot use constants or variables for the width modifier) For example, your struct could be:

#define MAXID    8      /* if you need a constant, #define one (or more) */
#define MAXNM   64
#define MAXSTD 128
#define MAXLN MAXSTD

typedef struct {        /* simple struct to hold student data */
    char id[MAXID];
    char name[MAXNM];
    double score;
} student_t;

(and POSIX reserves the "_t" suffix for extension of types, but there won't be a "student_t" type -- but in general be aware of the restriction though you will see the "_t" suffix frequently)

The basic approach is to read a line from your file into a buffer (with either fgets or POSIX getline) and then pass the line to sscanf. You condition your read loop on the successful read of each line so your read stops when EOF is reached. For separating the values with sscanf, it is convenient to use a temporary struct to hold the separated values. That way if the separation is successful, you simply add the temporary struct to your array. To read the students into an array of student_t you could do:

size_t readstudents (FILE *fp, student_t *s)
{
    char buf[MAXLN];    /* temporary array (buffer) to hold line */
    size_t n = 0;       /* number of students read from file */

    /* read each line in file until file read or array full */
    while (n < MAXSTD && fgets (buf, MAXLN, fp)) {
        student_t tmp = { .id = "" };   /* temporary stuct to fill */
        /* extract id, name and score from line, validate */
        if (sscanf (buf, "%7s %63[^0-9] %lf", tmp.id, tmp.name, &tmp.score) == 3) {
            char *p = strrchr (tmp.name, 0);    /* pointer to end of name */
            /* backup overwriting trailing spaces with nul-terminating char */
            while (p && --p >= tmp.name && *p == ' ')
                *p = 0;
            s[n++] = tmp;   /* add temp struct to array, increment count */
        }
    }

    return n;   /* return number of students read from file */
}

Now let's take a minute and look at the sscanf format string used:

    sscanf (buf, "%7s %63[^0-9] %lf", tmp.id, tmp.name, &tmp.score)

Above, with the line in buf, the format string used is "%7s %63[^0-9] %lf". Each character array type uses a field-width modifier to limit the number of characters stored in the associated array to one-less-than the number of characters available. This protects the array bounds and ensures that each string stored is nul-terminated. The "%7s" is self-explanatory - read at most 7-characters into what will be the id.

The next conversion specifier for the name is "%63[^0-9]" which is a bit more involved as it uses the "%[...] character class conversion specifier with the match inverted by use of '^' as the first character. The characters in the class being digits 0-9, the conversion specifier reads up to 63 character that do Not include digits. This will have the side-effect of including the spaces between name and score in name. Thankfully they are simple enough to remove by getting a pointer to the end of the string with strrchr (tmp.name, 0); and then backing up checking if the character is a ' ' (space) and overwriting it with a nul-terminating character (e.g. '\0' or numeric equivalent 0).

The last part of the sscanf conversion, "%lf" is simply the conversion specifier for the double value for score.

Note: most importantly, the conversion is validated by checking the return of the call to sscanf is 3 -- the number of conversions requested. If all conversions succeed into the temporary struct tmp, then tmp is simply added to your array of struct.

To call the function from main() and read the student information, you simply declare an array of student_t to hold the information, open and validate your data file is open for reading, and make a call to readstudents capturing the return to validate that student information was actually read from the file. Then you can make use of the data as you wish (it is simply output below):

int main (int argc, char **argv) {

    student_t students[MAXSTD] = {{ .id = "" }};    /* array of students */
    size_t nstudents = 0;                           /* count of students */
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }
    /* read students from file, validate return, if zero, handle error */
    if ((nstudents = readstudents (fp, students)) == 0) {
        fputs ("error: no students read from file.\n", stderr);
        return 1;
    }

    if (fp != stdin)   /* close file if not stdin */
        fclose (fp);

    for (size_t i = 0; i < nstudents; i++)  /* output each student data */
        printf ("%-8s  %-24s  %g\n", 
                students[i].id, students[i].name, students[i].score);

    return 0;
}

All that remains is including the required headers, stdio.h and string.h and testing:

Example Use/Output

$ ./bin/read_stud_id_name_score dat/stud_id_name_no.txt
0001      William Bob               8.5
0034      Howard Stark              9.5
0069      Natalia Long Young        8

It works as needed.

Note, this is the most basic way of separating the values and only works based on the assumption that your score field starts with a digit.

You can eliminate that assumption by manually parsing the information you need by reading each line in the same manner, but instead of using sscanf, simply declare a pair of pointers to isolate id, name & score manually. The basic approach being to advance a pointer to the first whitespace and read id, skip the following whitespace and position the pointer at the beginning of name. Start from the end of the line with the other and backup to the first whitespace at the end and read score, then continue backing up positioning the pointer in the first space after name. Then just copy the characters between your start and end pointer to name and nul-terminate. It is more involved from a pointer-arithmetic standpoint, but just as simple. (that is left to you)

Look things over and let me know if you have further questions. Normally, you would dynamically declare your array of students and allocate/reallocate as needed to handle any number of students from the file. (or from an actual C++ standpoint use the vector and string types that the standard template library provides and let the containers handle the memory allocation for you) That too is just one additional layer that you can add to add flexibility to your code.


C++ Implementation

I apologize for glossing over a C++ solution, but given your use of C string functions in your posted code, I provided a C solution in return. A C++ solution making using the std::string and std::vector is not that much different other than from a storage standpoint. The parsing of the three values is slightly different, where the entire line is read into id and name and then the score is obtained from the portion of the line held in name and then those characters erased from name.

Changing the C FILE* to std::ifstream and the array of student_t to a std::vector<student_t>, your readstudents() function could be written as:

void readstudents (std::ifstream& fp, std::vector<student_t>& s)
{
    std::string buf;    /* temporary array (buffer) to hold line */
    student_t tmp;      /* temporary stuct to fill */

    /* read each line in file until file read or array full */
    while (fp >> tmp.id && getline(fp, tmp.name)) {
        /* get offset to beginning digit within tmp.name */
        size_t  offset = tmp.name.find_first_of("0123456789"),
                nchr;   /* no. of chars converted with stod */
        if (offset == std::string::npos)    /* validate digit found */
            continue;
        /* convert to double, save in tmp.score */
        tmp.score = std::stod(tmp.name.substr(offset), &nchr);
        if (!nchr)      /* validate digits converted */
            continue;
        /* backup using offset to erase spaces after name */
        while (tmp.name.at(--offset) == ' ')
            tmp.name.erase(offset);

        s.push_back(tmp);   /* add temporary struct to vector */
    }
}

(note: the return type is changed to void as the .size() of the student vector can be validated on return).

The complete example would be:

#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>
#include <vector>

struct student_t {      /* simple struct to hold student data */
    std::string id;
    std::string name;
    double score;
};

void readstudents (std::ifstream& fp, std::vector<student_t>& s)
{
    std::string buf;    /* temporary array (buffer) to hold line */
    student_t tmp;      /* temporary stuct to fill */

    /* read each line in file until file read or array full */
    while (fp >> tmp.id && getline(fp, tmp.name)) {
        /* get offset to beginning digit within tmp.name */
        size_t  offset = tmp.name.find_first_of("0123456789"),
                nchr;   /* no. of chars converted with stod */
        if (offset == std::string::npos)    /* validate digit found */
            continue;
        /* convert to double, save in tmp.score */
        tmp.score = std::stod(tmp.name.substr(offset), &nchr);
        if (!nchr)      /* validate digits converted */
            continue;
        /* backup using offset to erase spaces after name */
        while (tmp.name.at(--offset) == ' ')
            tmp.name.erase(offset);

        s.push_back(tmp);   /* add temporary struct to vector */
    }
}

int main (int argc, char **argv) {

    std::vector<student_t> students {};     /* array of students */

    if (argc < 2) { /* validate one argument given for filename */
        std::cerr << "error: filename required as 1st argument.\n";
        return 1;
    }

    std::ifstream fp (argv[1]);  /* use filename provided as 1st argument */

    if (!fp.good()) {   /* validate file open for reading */
        std::cerr << "file open failed";
        return 1;
    }
    /* read students from file, validate return, if zero, handle error */
    readstudents (fp, students);
    if (students.size() == 0) {
        std::cerr << "error: no students read from file.\n";
        return 1;
    }

    for (auto s : students)  /* output each student data */
        std::cout << std::left << std::setw(8) << s.id
                << std::left << std::setw(24) << s.name
                << s.score << '\n';
}

(the output is the same -- aside from 2-spaces omitted between the values)

Look things over and let me know if you have questions.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • Thank you so much for your answer! Also, my apologies for the slow reply. There are so many new things that I don't even know where to ask. Below is my attempt to fix the `while(!file.eof())` (still not quite know how to fix it), and try to "copy the characters between your start and end pointer to name and nul-terminate". Thank you again for your reply. – Phineas Dec 31 '19 at 07:09
  • The `while(!feof.eof())` problem is that after your read of the last line of data, `.eof()` will NOT be set even though the *file-position-indicator* is positioned just prior to `EOF`, and you check `while(!feof.eof())` and enter another iteration. Your call to `fi.getline(str, MAX - 1);` fails and there is no valid string in `str`, but since you don't check the return of `getline`, you `char* temp = strtok(str, " ");` invoking *Undefined Behavior* calling `strtok` on `str` which is *indeterminate* at that point. I'll find a pointer parse example for you. – David C. Rankin Dec 31 '19 at 07:22
  • Here is a decent example of using a start-pointer and end-pointer to locate the longest word in a string: [Finding Longest Word in a String](https://stackoverflow.com/questions/57212190/finding-longest-word-in-a-string/57212725?r=SearchResults&s=2|52.6046#57212725). The process of manually working down a string is the same regardless of whether it is C/C++. The only difference is C++ STL provides the `.substr()` member function that can replace the manual `memcpy` and *nul-termination*. – David C. Rankin Dec 31 '19 at 07:44
  • Here is an example of your function using a pair of pointers to parse the information: [Read Student ID Name Score](https://susepaste.org/77547902) – David C. Rankin Dec 31 '19 at 08:31