-1

I have an issue where I cannot get my C++ program to read double digit integers. My idea is to read it as string and then somehow parse it into separate integers and insert them into an array, but I am stuck on getting the code to read digits properly.

Sample Output:

 i: 0 codeColumn 0

 i: 1 codeColumn 1

 i: 2 codeColumn 0 0

 i: 3 codeColumn 0

 i: 4 codeColumn 31 0

 i: 5 codeColumn 1

 i: 6 codeColumn 43 0

 i: 7 codeColumn 3

 i: 8 codeColumn 9 0

So the file is basically a line of triplets delimited by a comma:

0,1,0 0,0,31 0,0,18 0,0,8 0,11,0

My question is how do you get the trailing zeroes (see above) to move to a new line? I tried using "char" and a bunch of if statements to concatenate the single digits into double digits, but I feel like that's not really efficient or ideal. Any ideas?

My code:

#include <iostream>     // Basic I/O
#include <string>       // string classes
#include <fstream>      // file stream classes
#include <sstream>
#include <vector>

int main()
{

    ifstream fCode;
    fCode.open("code.txt"); 
    vector<string> codeColumn;

    while (getline(fCode, codeLine, ',')) {
        codeColumn.push_back(codeLine);
    }

    for (size_t i = 0; i < codeColumn.size(); ++i) {

                cout << " i: " << i << " codeColumn " << codeColumn[i] << endl;

    }

    fCode.close();

}
J J
  • 51
  • 8
  • getline(fCode, codeLine, ' ') space instead of comma – QuentinUK Mar 28 '20 at 16:56
  • 1
    This doesn't address the question, but get in the habit of initializing objects with meaningful values rather than creating them with default values and immediately overriding them. In this case, that means changing `ifstream fCode; fCode.open("code.txt");` to `ifstream fCode("code.txt");`. And you don't need to call `fCode.close();`. The destructor will do that. – Pete Becker Mar 28 '20 at 16:58
  • 1
    You have **Comma Separated Values** or CSV. Search the internet for "C++ Read CSV". Always search first, as there are a plethora of CSV questions on StackOverflow and the internet. – Thomas Matthews Mar 28 '20 at 17:45
  • @ThomasMatthews thanks for the feedback. I understand that, but there are so many and I was really fconfused on how to deal with two separators. Kind of hard to search for that. But now I know, as I didn't know the ',' and ' ' in this case are called delimiters. I was just lacking the vocab to search for it – J J Mar 28 '20 at 18:02
  • Most of the linked answers use "std::getline" for tokenizing strings. I never do that. I think that it is somehow strange to use "getline" for "tokenizing". Please see my answer below. But before starting religious discussions: Everybody can do what he wants. – A M Mar 28 '20 at 22:13

2 Answers2

0
getline(fCode, codeLine, ',')

is going to read between commas, so 0,1,0 0,0,31 will split up exactly as you have seen.

0,1,0 0,0,31
 ^ ^   ^ ^

The tokens collected are everything between the ^s

You have two delimiters you need to take into account comma and space. The easiest way to handle the space is with dumb old >>.

std::string triplet;
while (fCode >> triplet)
{
    // do stuff with triplet. Maybe something like      
    std::istringstream strm(triplet); // make a stream out of the triplet
    int a;
    int b; 
    int c;
    char sep1;
    char sep2;
    while (strm >> a >> sep1 >> b >> sep2 >> c // read all the tokens we want from triplet 
           && sep1 == sep2 == ',') // and the separators are commas. Triplet is valid
    {
       // do something with a, b, and c
    }
}

Documentation for std::istringstream.

user4581301
  • 33,082
  • 7
  • 33
  • 54
  • isn't strm a string? Does it automatically convert whatever is read to integer if you pass it to an int? Sorry I am very new to coding C++ – J J Mar 28 '20 at 17:20
  • @JJ [`strm` is an `istringstream`](https://en.cppreference.com/w/cpp/io/basic_istringstream), an input stream just like an `ifstream` or `cin`, except instead of reading from a file or the console it reads from the string you fed it. – user4581301 Mar 28 '20 at 19:38
0

So, I will show you 3 solutions from easy to understand C-Style code, then more-modern C++ code using the std::algorithm library and iterators, and, at the end an object oriented C++ solution.

I will also explain to you that std::getline can be, but should not be used for splitting strings into tokens.

I saw from your question that you had difficulties to understand that. And I understand your concern.

But let's start with an easy solution. I show the code and then explain it to you:

#include <iostream>
#include <fstream>
#include <string>

int main() {

    // Open the source text file, and check, if there was no failure
    if (std::ifstream fCode{ "r:\\code.txt" }; fCode) {

        size_t tripletCounter{ 0 };

        // Now, read all triplets from the file in a simple for loop
        for (std::string triplet{}; fCode >> triplet; ) {

            // Prepare output
            std::cout << "\ni:\t" << tripletCounter++ << "\tcodeColumn:\t";

            // Go through the triplet, search for comma, then output the parts
            for (size_t i{ 0U }, startpos{ 0U }; i <= triplet.size(); ++i) {

                // So, if there is a comma or the end of the string
                if ((triplet[i] == ',') || (i == (triplet.size()))) {

                    // Print substring
                    std::cout  << (triplet.substr(startpos, i - startpos)) << ' ';
                    startpos = i + 1;
                }
            }
        }
    }
    else {
        std::cerr << "\n*** Error, Could not open source file\n";
    }
    return 0;
}

You see, we need just a few lines of easy to understand code that will fullfil your requirements and produce the desired output.

Some maybe for you new features:

The if statement with initializer. This is available since C++17. You can (in addition to the condition) define a variable and initalize it. So, in

if (std::ifstream fCode{ "r:\\code.txt" }; fCode) {

we first define a variable with name "fCode" of type std::ifstream. We use the uniform initialzer "{}", to initialze it with the input file name.

This will call the constructor for the variable "fCode", and open the file. (This is was this constructor does). After the closing "}" of the "if-statement" the variable "fCode" will fall out of scope and the destructor for the std::ifstream will be called. This will close the file automatically.

This type of if-statement has been introduced to help to prevent name space solution. The variable shall only be visible in the scope, where it is used. Without that, you would have to define the std::ifstream outside (before) the if and it would be visible for the outer context and the file would be closed at a very late time. So, please get aquainted to that.

Next we define the a "tripletCounter". That is hust necessary for output. There is no other usage.

Then, again such an if-statement with initailizer. We first define an empty std::string "triplet" and then use the extractor operator to read text until the next white space. This is how the "extractor" (>>) works. We use the whole expression as condition, to check, if the extraction worlked, or if we hit the end of file (or some other error). This works because the extractor operator returns the stream in that is was working, so a reference to "fCode". And the stream has on overwritten boolen operator !, to check the condition of the stream. Please see here.

You should always and for every IO-Operation check, if it worked or not.

So, next we split the triple (e.g. "0,1,0") into its sub-strings with an very easy for loop. We go through all characters in the string and check, if the current chacter is a comma or the end of string. In that case, we output, the characters before the delimiter.

Very simple and easy to understand. std::getline is not needed here.


So, next solution, more advanced:

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <regex>

std::regex re(",");

int main() {

    // Open the source text file, and check, if there was no failure
    if (std::ifstream fCode{ "r:\\code.txt" }; fCode) {

        size_t tripletCounter{ 0 };

        // Now, read all triplets from the file into a vector
        std::vector triplets(std::istream_iterator<std::string>(fCode), {});

        // Next, go through all triplets 
        for (const std::string &triplet : triplets) {

            // Prepare output
            std::cout << "\ni:\t" << tripletCounter++ << "\tcodeColumn:\t";

            // Split triplet into code column. All codes are in vector codeColums
            std::vector codeColumns(std::sregex_token_iterator(triplet.begin(), triplet.end(), re, -1), {});

            //Show codes
            for (const std::string& code : codeColumns) std::cout << code << ' ';
        }
    }
    else {
        std::cerr << "\n*** Error, Could not open source file\n";
    }
    return 0;
}

The beginning is the same. But then:

// Now, read all triplets from the file into a vector
std::vector triplets(std::istream_iterator<std::string>(fCode), {});

UhOh. Whats that. Let's start with the std::istream_iterator. If you read the linked description, then you will find out, that it will basically call the extractor operator >> for the specified type. And since it is an iterator, it will call it again and again, if the iterator is incremented. Ok, understandable, but then

We define variable triplets as std::vector and call its constructor with 2 arguments. That constructor is the the so called range constructor of the std::vector. Please see the descrition for constructor 5. Aha, it gets a "begin()" iterator and an "end()" iterator. Aha, but what is this strange {} instead of the "end()"-iterator. This is the default initializer (please see here and here. And if we look at the description of the std::istream_iterator we can see the the default is the end iterator. OK, understood.

I assum that you know about the range based for, which comes next. Good. But now, we come to the most difficult point. Splitting a string with delimiters. People are using std::getline. But why? Why are people doing such strange stuff?


What do people expect from the function, when they read

getline ?

Most people would say, Hm, I guess it will read a complete line from somewhere. And guess what, that was the basic intention for this function. Read a line from a stream and put it into a string.

As you can see here std::getline has some additional functionality.

And this lead to a major misuse of this function for splitting up std::strings into tokens.

Splitting strings into tokens is a very old task. In very early C there was the function strtok, which still exists, even in C++. Please see std::strtok.

But because of the additional functionality of std::getline is has been heavily misused for tokenizing strings. If you look on the top question/answer regarding how to parse a CSV file (please see here), then you will see what I mean.

People are using std::getline to read a text line, a string, from the original stream, then stuffing it into an std::istringstream again and use std::getline with delimiter again to parse the string into tokens.

Weird.

Because, since many many years, we have a dedicated, special function for tokenizing strings, especially and explicitly designed for that purpose. It is the

std::sregex_token_iterator

And since we have such a dedicated function, we should simply use it.

This thing is an iterator. For iterating over a string, hence the function name is starting with an s. The begin part defines, on what range of input we shall operate, (begin(), end()), then there is a std::regex for what should be matched / or what should not be matched in the input string. The type of matching strategy is given with last parameter.

  • 0 --> give me the stuff that I defined in the regex and
  • -1 --> give me that what is NOT matched based on the regex.

We can use this iterator for storing the tokens in a std::vector. The std::vector has a range constructor, which takes 2 iterators as parameter, and copies the data between the first iterator and 2nd iterator to the std::vector. The statement

std::vector tokens(std::sregex_token_iterator(s.begin(), s.end(), re, -1), {});

defines a variable “tokens” as a std::vector and uses again the range-constructor of the std::vector. Please note: I am using C++17 and can define the std::vector without template argument. The compiler can deduce the argument from the given function parameters. This feature is called CTAD ("class template argument deduction"). I also used that for the vector above.

Additionally, you can see that I do not use the "end()"-iterator explicitly.

This iterator will be constructed from the empty brace-enclosed default initializer with the correct type, because it will be deduced to be the same as the type of the first argument due to the std::vector constructor requiring that, as already described.

You can read any number of tokens in a line and put it into the std::vector

But you can do even more. You can validate your input. If you use 0 as last parameter, you define a std::regex that even validates your input. And you get only valid tokens.

Overall, the usage of a dedicated functionality is superior over the misused std::getline and people should simply use it.

Some people may complain about the function overhead, but how many of them are using big data. And even then, the approach would be probably then to use string.findand string.substring or std::stringviews or whatever.

So, somehow advanced, but you will eventually learn it.


And now we will use an object oriented approach. As you know, C++ is an object oriented language.

We can put data, and methods working with that data, in a class (struct). The functionality is encapsulated. Only the class should know, how to operate on its data. Sw, we will define a class "Code". This contains a std::array consisting of 3 st::strings. and associated functions. For the array we made a typedef for easier writing. The functions that we need, are input and output. So, we will overwrite the extractor and the inserter operator.

In these operators, we use functions as dscribed above.

And as a result of all this work, we get an elegant main function, where all the work is done in 3 lines of code.

Please see:

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <regex>
#include <array>
#include <algorithm>

using Triplet = std::array<std::string, 3>;
std::regex re(",");

struct Code {
    // Our Data
    Triplet triplet{};

    // Overwrite extractor operator for easier input
    friend std::istream& operator >> (std::istream& is, Code& c) {

        // Read a triplet with commans
        if (std::string s{}; is >> s) {

            // Copy the single columns of the triplet in to our internal Data structure
            std::copy(std::sregex_token_iterator(s.begin(), s.end(), re, -1), {}, c.triplet.begin());
        }
        return is;
    }
    // Overwrite inserter for easier output
    friend std::ostream& operator << (std::ostream& os, const Code& c) {
        return os << c.triplet[0] << ' ' << c.triplet[1] << ' ' << c.triplet[2];
    }
};

int main() {

    // Open the source text file, and check, if there was no failure
    if (std::ifstream fCode{ "r:\\code.txt" }; fCode) {


        // Now, read all triplets from the file, split it and put the Codes into a vector
        std::vector code(std::istream_iterator<Code>(fCode), {});

        // Show output
        for (size_t tripletCounter{ 0U }; tripletCounter < code.size(); tripletCounter++)
            std::cout << "\ni:\t" << tripletCounter << "\tcodeColumn:\t" << code[tripletCounter];
    }
    else {
        std::cerr << "\n*** Error, Could not open source file\n";
    }
    return 0;
}
A M
  • 14,694
  • 5
  • 19
  • 44