0

I want to read in data from a text file and output to a new file with some more data. All the data is transferred properly but just out of order. When I use the >> operators (extractors) everything is in order correctly however the commas are left in. When I use getline the commas are gone but the data is not in the right order.

Here's what's in the text file:

Astronomy, 34684, MoWed, 7:15pm-9:55pm, JC16
ComputerScience, 36822, MoWed, 9:00am-10:40am, E137
Calculus, 32700, MoTuTh, 11:00am-12:15am, CW134
ComputerOrganization, 45665, Th, 7:20pm-9:55pm, E137

here's what I get when using getline

 Class: Astronomy
       -Class ID:  34684
       -Meeting Days:  MoWed
       -Class Time:  7:15pm-9:55pm
       -Class Location:  JC16
    ComputerScience
    Class:  36822
       -Class ID:  MoWed
       -Meeting Days:  9:00am-10:40am
       -Class Time:  E137
    Calculus
       -Class Location:  32700
    Class:  MoTuTh
       -Class ID:  11:00am-12:15am
       -Meeting Days:  CW134
    ComputerOrganization
       -Class Time:  45665
       -Class Location:  Th

Heres what I get when I use the extractors

 Class: Astronomy,
       -Class ID: 34684,
       -Meeting Days: MoWed,
       -Class Time: 7:15pm-9:55pm,
       -Class Location: JC16
    Class: ComputerScience,
       -Class ID: 36822,
       -Meeting Days: MoWed,
       -Class Time: 9:00am-10:40am,
       -Class Location: E137
    Class: Calculus,
       -Class ID: 32700,
       -Meeting Days: MoTuTh,
       -Class Time: 11:00am-12:15am,
       -Class Location: CW134
    Class: ComputerOrganization,
       -Class ID: 45665,
       -Meeting Days: Th,
       -Class Time: 7:20pm-9:55pm,
       -Class Location: E137

any tips? here's the code

#include <iostream>
#include <fstream>
#include <string>
#include <iomanip>

using namespace std;

//Holds all the class information
struct Course{

    string courseName;
    string courseNum;
    string courseDay;
    string courseTime;
    string courseLoc;

};

//Extracts data from the file with the course information
Course getInfo(ifstream &inFile);

//Creates a file with the data from 'getInfo'
void writeInfo(ofstream &outFile, Course);

int main(){

    ifstream inFile; //link to input file
    ofstream outFile; //link to output file
    Course course; //holds all course info

    inFile.open("Courses.txt"); //opens textfile
    outFile.open("Courses.dat"); //creates new file

    course = getInfo(inFile); //priming read

   while (inFile) {

        writeInfo(outFile, course); //write info to output file

        course = getInfo(inFile); //get info from input file

    }

    inFile.close();
    outFile.close();

}

Course getInfo(ifstream &inFile){

    Course course;

    getline(inFile, course.courseName, ',');
    getline(inFile, course.courseNum, ',');
    getline(inFile, course.courseDay, ',');
    getline(inFile, course.courseTime, ',');
    getline(inFile, course.courseLoc, ',');
  // inFile >> course.courseName >> course.courseNum >> course.courseDay;
 //   inFile >> course.courseTime  >> course.courseLoc;

    return course;

}

void writeInfo(ofstream &outFile, Course course){

    outFile << "Class: " << course.courseName << endl;
    outFile << "   -Class ID: " << course.courseNum << endl;
           outFile << "   -Meeting Days: " << course.courseDay << endl;
         outFile   << "   -Class Time: " << course.courseTime << endl;
     outFile       << "   -Class Location: " << course.courseLoc << endl;


}
Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
Matt M
  • 33
  • 5
  • `getline` will read up until it hits the delimiter. That means it sees the first class location not as `JC16`, but `JC16\nComputerScience`. – JohnFilleau Mar 26 '20 at 02:28
  • The usual approach on problems where your input file contains line-separated records, and delimiter-separated tokens, is to call `getline` on the entire line, create a `stringstream` out of the result, and then parse THAT to extract tokens (using getline or formatted output extractors) – JohnFilleau Mar 26 '20 at 02:30
  • @Ted is it `yaml`? I didn't know. If that works, then it works! Although that probably wouldn't be in the spirit of this homework assignment. – JohnFilleau Mar 26 '20 at 02:59

1 Answers1

1

What do people expect from the function, when they read

getline ?

Most people would say, Hm, I guess it will read a complete line from somewhere. And guess what, that was the basic intention for this function. Read a line from a stream and put it into a string. As you can see here std::getline has some additional functionality.

And this lead to a major misuse of this function for splitting up std::strings into tokens.

Splitting strings into tokens is a very old task. In very early C there was the function strtok, which still exists, even in C++. Here std::strtok.

But because of the additional functionality of std::getline is has been heavily misused for tokenizing strings. If you look on the top question/answer regarding how to parse a CSV file (please see here), then you will see what I mean.

People are using std::getline to read a text line, a string, from the original stream, then stuffing it into an std::istringstream and use std::getline with delimiter again to parse the string into tokens. Weird.

But, since many many many years, we have a dedicated, special function for tokenizing strings, especially and explicitly designed for that purpose. It is the

std::sregex_token_iterator

And since we have such a dedicated function, we should simply use it.

This thing is an iterator. For iterating over a string, hence the function name is starting with an s. The begin part defines, on what range of input we shall operate, then there is a std::regex for what should be matched / or what should not be matched in the input string. The type of matching strategy is given with last parameter.

  • 0 --> give me the stuff that I defined in the regex and (optional)
  • -1 --> give me that what is NOT matched based on the regex.

We can use this iterator for storing the tokens in a std::vector. The std::vector has a range constructor, which takes 2 iterators as parameter, and copies the data between the first iterator and 2nd iterator to the std::vector. The statement

std::vector tokens(std::sregex_token_iterator(s.begin(), s.end(), re, -1), {});

defines a variable “tokens” as a std::vector and uses the so called range-constructor of the std::vector. Please note: I am using C++17 and can define the std::vector without template argument. The compiler can deduce the argument from the given function parameters. This feature is called CTAD ("class template argument deduction").

Additionally, you can see that I do not use the "end()"-iterator explicitly.

This iterator will be constructed from the empty brace-enclosed default initializer list with the correct type, because it will be deduced to be the same as the type of the first argument due to the std::vector constructor requiring that.

You can read any number of tokens in a line and put it into the std::vector

But you can do even more. You can validate your input. If you use 0 as last parameter, you define a std::regex that even validates your input. And you get only valid tokens.

Additionally, it helps you to avoid the error that you made, with the last getline statement.

Overall, the usage of a dedicated functionality is superior over the misused std::getline and people should simple use it.

Some people may complain about the function overhead, but how many of them are using big data. And even then, the approach would be probably then to use string.findand string.substring or std::stringviews or whatever.

So, now to further topics.

You should not use: using namespace std;. You will find 1000’s of hints here on SO, why not.

You should start using object-oriented features in C++. In C++ you can put data and methods that operate on these data into one object. The reason is that the outside world should not care about objects internals. For example, your writeInfo and getInfo function should be part of your struct (or class).

And as next step, we will not use your “get” and “write” functions. We will use the dedicated function for Stream-IO, the extractor operator >> and the inserter operator <<. And we will overwrite the standard IO-functions in your struct.

In function main we will open the 2 files and check, if the open was successful. BTW. All input output functions shall be checked, if they were successful.

Then, we use the next iterator, the std::istream_iterator. And this together with our “Course”-type in the input file stream. The std::istream_iterator will repeatedly call the Course extractor operator, until all lines of the source file are read.

Your program then maybe could look like this

#include <iostream>
#include <string>
#include <vector>
#include <iterator>
#include <regex>
#include <fstream>

// This will validate your input
std::regex re{ "," };

struct Course {

    std::string name;
    std::string num;
    std::string day;
    std::string time;
    std::string loc;

    friend std::istream& operator >> (std::istream& is, Course& c) {

        // Read a complete line from a stream and check, if that worked
        if (std::string s{}; std::getline(is, s)) {

            // Split it into tokens and validate the input
            std::vector token(std::sregex_token_iterator(s.begin(), s.end(), re,-1), {});

            // Sanity check: We should have 5 entries
            if (5 == token.size()) {
                // Then copy the tokens to our internal data members
                c.name = token[0]; c.num = token[1]; c.day = token[2]; c.time = token[3]; c.loc = token[4];
            }
        }
        return is;
    }
    friend std::ostream& operator << (std::ostream& os, const Course& c) {
        return os << "\nClass: " << c.name << "\n   -Class ID: " << c.num << "\n   -Meeting Days: " << c.day
            << "\n   -Class Time: " << c.time << "\n   -Class Location: " << c.loc;
    }
};

int main() {
    // Open input file and check, if it is open
    if (std::ifstream inFile("r:\\Courses.txt"); inFile) {

        // Open the output file 
        if (std::ofstream outFile("r:\\Courses.dat"); outFile) {

            // Read all lines with course data, split them and put them into the below vector
            std::vector courses(std::istream_iterator<Course>(inFile), {});

            // Now that we have read the complete input data, we sho the result to the user
            for (const Course& c : courses) outFile << c << "\n";
        }
        else {
            std::cerr << "\n*** Error: Could not open output file 'Courses.dat'\n";
        }
    }
    else {
        std::cerr << "\n*** Error: Could not open input file 'Courses.txt'\n";
    }
    return 0;
}

Of course, there are one million other possibilities.

And at the end:

Everybody can do what he wants.

A M
  • 14,694
  • 5
  • 19
  • 44