So, I will show you 3 solutions from easy to understand C-Style code, then more-modern C++ code using the std::algorithm
library and iterators, and, at the end an object oriented C++ solution.
I will also explain to you that std::getline
can be, but should not be used for splitting strings into tokens.
I saw from your question that you had difficulties to understand that. And I understand your concern.
But let's start with an easy solution. I show the code and then explain it to you:
#include <iostream>
#include <fstream>
#include <string>
int main() {
// Open the source text file, and check, if there was no failure
if (std::ifstream fCode{ "r:\\code.txt" }; fCode) {
size_t tripletCounter{ 0 };
// Now, read all triplets from the file in a simple for loop
for (std::string triplet{}; fCode >> triplet; ) {
// Prepare output
std::cout << "\ni:\t" << tripletCounter++ << "\tcodeColumn:\t";
// Go through the triplet, search for comma, then output the parts
for (size_t i{ 0U }, startpos{ 0U }; i <= triplet.size(); ++i) {
// So, if there is a comma or the end of the string
if ((triplet[i] == ',') || (i == (triplet.size()))) {
// Print substring
std::cout << (triplet.substr(startpos, i - startpos)) << ' ';
startpos = i + 1;
}
}
}
}
else {
std::cerr << "\n*** Error, Could not open source file\n";
}
return 0;
}
You see, we need just a few lines of easy to understand code that will fullfil your requirements and produce the desired output.
Some maybe for you new features:
The if statement with initializer. This is available since C++17. You can (in addition to the condition) define a variable and initalize it. So, in
if (std::ifstream fCode{ "r:\\code.txt" }; fCode) {
we first define a variable with name "fCode" of type std::ifstream
. We use the uniform initialzer "{}", to initialze it with the input file name.
This will call the constructor for the variable "fCode", and open the file. (This is was this constructor does). After the closing "}" of the "if-statement" the variable "fCode" will fall out of scope and the destructor for the std::ifstream
will be called. This will close the file automatically.
This type of if-statement has been introduced to help to prevent name space solution. The variable shall only be visible in the scope, where it is used. Without that, you would have to define the std::ifstream
outside (before) the if and it would be visible for the outer context and the file would be closed at a very late time. So, please get aquainted to that.
Next we define the a "tripletCounter". That is hust necessary for output. There is no other usage.
Then, again such an if-statement with initailizer. We first define an empty std::string
"triplet" and then use the extractor operator to read text until the next white space. This is how the "extractor" (>>) works. We use the whole expression as condition, to check, if the extraction worlked, or if we hit the end of file (or some other error). This works because the extractor operator returns the stream in that is was working, so a reference to "fCode". And the stream has on overwritten boolen operator !, to check the condition of the stream. Please see here.
You should always and for every IO-Operation check, if it worked or not.
So, next we split the triple (e.g. "0,1,0") into its sub-strings with an very easy for loop. We go through all characters in the string and check, if the current chacter is a comma or the end of string. In that case, we output, the characters before the delimiter.
Very simple and easy to understand. std::getline
is not needed here.
So, next solution, more advanced:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <regex>
std::regex re(",");
int main() {
// Open the source text file, and check, if there was no failure
if (std::ifstream fCode{ "r:\\code.txt" }; fCode) {
size_t tripletCounter{ 0 };
// Now, read all triplets from the file into a vector
std::vector triplets(std::istream_iterator<std::string>(fCode), {});
// Next, go through all triplets
for (const std::string &triplet : triplets) {
// Prepare output
std::cout << "\ni:\t" << tripletCounter++ << "\tcodeColumn:\t";
// Split triplet into code column. All codes are in vector codeColums
std::vector codeColumns(std::sregex_token_iterator(triplet.begin(), triplet.end(), re, -1), {});
//Show codes
for (const std::string& code : codeColumns) std::cout << code << ' ';
}
}
else {
std::cerr << "\n*** Error, Could not open source file\n";
}
return 0;
}
The beginning is the same. But then:
// Now, read all triplets from the file into a vector
std::vector triplets(std::istream_iterator<std::string>(fCode), {});
UhOh. Whats that. Let's start with the std::istream_iterator. If you read the linked description, then you will find out, that it will basically call the extractor operator >> for the specified type. And since it is an iterator, it will call it again and again, if the iterator is incremented. Ok, understandable, but then
We define variable triplets as std::vector
and call its constructor with 2 arguments. That constructor is the the so called range constructor of the std::vector
. Please see the descrition for constructor 5. Aha, it gets a "begin()" iterator and an "end()" iterator. Aha, but what is this strange {} instead of the "end()"-iterator. This is the default initializer (please see here and here. And if we look at the description of the std::istream_iterator
we can see the the default is the end iterator. OK, understood.
I assum that you know about the range based for, which comes next. Good. But now, we come to the most difficult point. Splitting a string with delimiters. People are using std::getline
. But why? Why are people doing such strange stuff?
What do people expect from the function, when they read
getline ?
Most people would say, Hm, I guess it will read a complete line from somewhere. And guess what, that was the basic intention for this function. Read a line from a stream and put it into a string.
As you can see here std::getline
has some additional functionality.
And this lead to a major misuse of this function for splitting up std::string
s into tokens.
Splitting strings into tokens is a very old task. In very early C there was the function strtok
, which still exists, even in C++. Please see std::strtok
.
But because of the additional functionality of std::getline
is has been heavily misused for tokenizing strings. If you look on the top question/answer regarding how to parse a CSV file (please see here), then you will see what I mean.
People are using std::getline
to read a text line, a string, from the original stream, then stuffing it into an std::istringstream
again and use std::getline
with delimiter again to parse the string into tokens.
Weird.
Because, since many many years, we have a dedicated, special function for tokenizing strings, especially and explicitly designed for that purpose. It is the
std::sregex_token_iterator
And since we have such a dedicated function, we should simply use it.
This thing is an iterator. For iterating over a string, hence the function name is starting with an s. The begin part defines, on what range of input we shall operate, (begin(), end()), then there is a std::regex for what should be matched / or what should not be matched in the input string. The type of matching strategy is given with last parameter.
- 0 --> give me the stuff that I defined in the regex and
- -1 --> give me that what is NOT matched based on the regex.
We can use this iterator for storing the tokens in a std::vector
. The std::vector
has a range constructor, which takes 2 iterators as parameter, and copies the data between the first iterator and 2nd iterator to the std::vector. The statement
std::vector tokens(std::sregex_token_iterator(s.begin(), s.end(), re, -1), {});
defines a variable “tokens” as a std::vector and uses again the range-constructor of the std::vector. Please note: I am using C++17 and can define the std::vector
without template argument. The compiler can deduce the argument from the given function parameters. This feature is called CTAD ("class template argument deduction"). I also used that for the vector above.
Additionally, you can see that I do not use the "end()"-iterator explicitly.
This iterator will be constructed from the empty brace-enclosed default initializer with the correct type, because it will be deduced to be the same as the type of the first argument due to the std::vector
constructor requiring that, as already described.
You can read any number of tokens in a line and put it into the std::vector
But you can do even more. You can validate your input. If you use 0 as last parameter, you define a std::regex
that even validates your input. And you get only valid tokens.
Overall, the usage of a dedicated functionality is superior over the misused std::getline
and people should simply use it.
Some people may complain about the function overhead, but how many of them are using big data. And even then, the approach would be probably then to use string.find
and string.substring
or std::stringviews
or whatever.
So, somehow advanced, but you will eventually learn it.
And now we will use an object oriented approach. As you know, C++ is an object oriented language.
We can put data, and methods working with that data, in a class (struct). The functionality is encapsulated. Only the class should know, how to operate on its data. Sw, we will define a class "Code". This contains a std::array
consisting of 3 st::string
s. and associated functions. For the array we made a typedef for easier writing. The functions that we need, are input and output. So, we will overwrite the extractor and the inserter operator.
In these operators, we use functions as dscribed above.
And as a result of all this work, we get an elegant main function, where all the work is done in 3 lines of code.
Please see:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <regex>
#include <array>
#include <algorithm>
using Triplet = std::array<std::string, 3>;
std::regex re(",");
struct Code {
// Our Data
Triplet triplet{};
// Overwrite extractor operator for easier input
friend std::istream& operator >> (std::istream& is, Code& c) {
// Read a triplet with commans
if (std::string s{}; is >> s) {
// Copy the single columns of the triplet in to our internal Data structure
std::copy(std::sregex_token_iterator(s.begin(), s.end(), re, -1), {}, c.triplet.begin());
}
return is;
}
// Overwrite inserter for easier output
friend std::ostream& operator << (std::ostream& os, const Code& c) {
return os << c.triplet[0] << ' ' << c.triplet[1] << ' ' << c.triplet[2];
}
};
int main() {
// Open the source text file, and check, if there was no failure
if (std::ifstream fCode{ "r:\\code.txt" }; fCode) {
// Now, read all triplets from the file, split it and put the Codes into a vector
std::vector code(std::istream_iterator<Code>(fCode), {});
// Show output
for (size_t tripletCounter{ 0U }; tripletCounter < code.size(); tripletCounter++)
std::cout << "\ni:\t" << tripletCounter << "\tcodeColumn:\t" << code[tripletCounter];
}
else {
std::cerr << "\n*** Error, Could not open source file\n";
}
return 0;
}