0

I want to parse a file with the following content:

2 300
abc12 130
bcd22 456
3 400
abfg12 230
bcpd22 46
abfrg2 13

Here, 2 is the number of lines, 300 is the weight.

Each line has a string and a number(price). Same with 3 and 400.

I need to store 130, 456 in an array.

Currently, I am reading the file and each line is processed as std::string. I need help to progress further.

Code:

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

//void processString(string line);
void process2(string line);

int main(int argc, char ** argv) {
    cout << "You have entered " << argc <<
        " arguments:" << "\n";

    for (int i = 1; i < argc; ++i)
        cout << argv[i] << "\n";

    //2, 4 are the file names

    //Reading file - market price file
    string line;
    ifstream myfile(argv[2]);
    if (myfile.is_open()) {
        while (getline(myfile, line)) {
            //  cout << line << '\n';
        }
        myfile.close();
    } else cout << "Unable to open market price file";

    //Reading file - price list file
    string line_;
    ifstream myfile2(argv[4]);
    int c = 1;
    if (myfile2.is_open()) {
        while (getline(myfile2, line_)) {
            // processString(line_);
            process2(line_);
        }
        myfile2.close();
    } else cout << "Unable to open price lists file";

    //processString(line_);
    return 0;
}

void process2(string line) {

    string word = "";

    for (auto x: line) {
        if (x == ' ') {
            word += " ";
        } else {
            word = word + x;
        }
    }
    cout << word << endl;
}

Is there a split function like in Java, so I can split and store everything as tokens?

Azeem
  • 11,148
  • 4
  • 27
  • 40
Learner
  • 61
  • 1
  • 7
  • 2
    You can wrap a line (in a `std::string`) into a [`std::istringstream`](https://en.cppreference.com/w/cpp/io/basic_istringstream) to make further parsing with formatted input operators (`>>`). FYI: [SO: Using istringstream in C++](https://stackoverflow.com/a/53491679/7478597), [SO: Splitting a string into integers using istringstream in C++](https://stackoverflow.com/a/5168710/7478597) and more with [google "site:stackoverflow.com c++ istringstream"](https://www.google.com/search?q=site%3Astackoverflow.com+c%2B%2B+istringstream) – Scheff's Cat Feb 16 '20 at 06:29

1 Answers1

3

You have 2 questions in your post:

  1. How do I parse this file in cpp?
  2. Is there a split function like in Java, so I can split and store everything as tokens?

I will answer both questions and show a demo example.

Let's start with splitting a string into tokens. There are several possibilities. We start with the easy ones.

Since the tokens in your string are delimited by a whitespace, we can take advantage of the functionality of the extractor operator (>>). This will read data from an input stream, up to a whitespace and then converts this read data into the specified variable. You know that this operation can be chained.

Then for the example string

    const std::string line{ "Token1 Token2 Token3 Token4" };

you can simply put that into a std::istringstream and then extract the variables from the stream:

    std::istringstream iss1(line);
    iss1 >> subString1 >> subString2 >> subString3 >> subString4;

The disadvantage is that you need to write a lot of stuff and you have to know the number of elements in the string.

We can overcome this problem with using a vector as the taget data store and fill it with its range constructor. The vectors range constructor takes a begin and and end interator and copies the data into it.

As iterator we use the std::istream_iterator. This will, in simple terms, call the extractor operator (>>) until all data is consumed. Whatever number of data we will have.

This will then look like the below:

    std::istringstream iss2(line);
    std::vector token(std::istream_iterator<std::string>(iss2), {});

This may look complicated, but is not. We define a variable "token" of type std::vector. We use its range constructor.

And, we can define the std::vector without template argument. The compiler can deduce the argument from the given function parameters. This feature is called CTAD ("class template argument deduction", C++17 required).

Additionally, you can see that I do not use the "end()"-iterator explicitely.

This iterator will be constructed from the empty brace-enclosed default initializer with the correct type, because it will be deduced to be the same as the type of the first argument due to the std::vector constructor requiring that.


There is an additional solution. It is the most powerful solution and hence maybe a little bit to complicated in the beginning.

With that can avoid the usage of std::istringstream and directly convert the string into tokens using std::sregex_token_iterator. Very simple to use. And the result is a one liner for splitting the original string:

std::vector<std::string> token2(std::sregex_token_iterator(line.begin(), line.end(), re, -1), {});

So, modern C++ has a build in functionality which is exactly designed for the purpose of tokenizing strings. It is called std::sregex_token_iterator. What is this thing?

As it name says, it is an iterator. It will iterate over a string (hence the 's' in its name) and return the split up tokens. The tokens will be matched again a regular expression. Or, natively, the delimiter will be matched and the rest will be seen as token and returned. This will be controlled via the last flag in its constructor.

Let's have a look at this constructor:

token2(std::sregex_token_iterator(line.begin(), line.end(), re, -1), {});

The first parameter is, where it should start in the source string, the 2nd parameter is the end position, up to which the iterator should work. The last parameter is:

  • 1, if you want to have a positive match for the regex
  • -1, will return everything that not matches the regex

And last but not least the regex itself. Please read in the net abot regex'es. There are tons of pages available.

Please see a demo for all 3 solutions here:

#include <iostream>
#include <string>
#include <vector>
#include <regex>
#include <sstream>
#include <iterator>
#include <algorithm>

/// Split string into tokens
int main() {

    // White space separated tokens in a string
    const std::string line{ "Token1 Token2 Token3 Token4" };

    // Solution 1: Use extractor operator ----------------------------------

    // Here, we will store the result
    std::string subString1{}, subString2{}, subString3{}, subString4{};

    // Put the line into an istringstream for easier extraction
    std::istringstream iss1(line);
    iss1 >> subString1 >> subString2 >> subString3 >> subString4;

    // Show result
    std::cout << "\nSolution 1:  Use inserter operator\n- Data: -\n" << subString1 << "\n"
        << subString2 << "\n" << subString3 << "\n" << subString4 << "\n";


    // Solution 2: Use istream_iterator ----------------------------------
    std::istringstream iss2(line);
    std::vector token(std::istream_iterator<std::string>(iss2), {});

    // Show result
    std::cout << "\nSolution 2:  Use istream_iterator\n- Data: -\n";
    std::copy(token.begin(), token.end(), std::ostream_iterator<std::string>(std::cout, "\n"));


    // Solution 3: Use std::sregex_token_iterator ----------------------------------
    const std::regex re(" ");

    std::vector<std::string> token2(std::sregex_token_iterator(line.begin(), line.end(), re, -1), {});

    // Show result
    std::cout << "\nSolution 3:  Use sregex_token_iterator\n- Data: -\n";
    std::copy(token2.begin(), token2.end(), std::ostream_iterator<std::string>(std::cout, "\n"));


    return 0;
}


So, now the answer on how you could read you text file.

It is essential to create the correct data structures. Then, overwrite the inserter and extractor operator and put the above functionality in it.

Please see the below demo example. Of course there are many other possible solutions:

#include <string>
#include <iostream>
#include <sstream>
#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>

struct ItemAndPrice {
    // Data
    std::string item{};
    unsigned int price{};

    // Extractor
    friend std::istream& operator >> (std::istream& is, ItemAndPrice& iap) {

        // Read a complete line from the stream and check, if that worked
        if (std::string line{}; std::getline(is, line)) {

            // Read the item and price from that line and check, if that worked
            if (std::istringstream iss(line); !(iss >> iap.item >> iap.price))

                // There was an error, while reading item and price. Set failbit of input stream
                is.setf(std::ios::failbit);
        }
        return is;
    }

    // Inserter
    friend std::ostream& operator << (std::ostream& os, const ItemAndPrice& iap) {
        // Simple output of our internal data
        return os << iap.item << " " << iap.price;
    }
};

struct MarketPrice {
    // Data
    std::vector<ItemAndPrice> marketPriceData{};
    size_t numberOfElements() const { return marketPriceData.size(); }
    unsigned int weight{};

    // Extractor
    friend std::istream& operator >> (std::istream& is, MarketPrice& mp) {

        // Read a complete line from the stream and check, if that worked
        if (std::string line{}; std::getline(is, line)) {

            size_t numberOfEntries{};
            // Read the number of following entries and the weigth from that line and check, if that worked
            if (std::istringstream iss(line); (iss >> numberOfEntries >> mp.weight)) {

                mp.marketPriceData.clear();
                // Now copy the numberOfEntries next lines into our vector
                std::copy_n(std::istream_iterator<ItemAndPrice>(is), numberOfEntries, std::back_inserter(mp.marketPriceData));
            }
            else {
                // There was an error, while reading number of following entries and the weigth. Set failbit of input stream
                is.setf(std::ios::failbit);
            }
        }
        return is;
    };

    // Inserter
    friend std::ostream& operator << (std::ostream& os, const MarketPrice& mp) {

        // Simple output of our internal data
        os << "\nNumber of Elements: " << mp.numberOfElements() << "   Weight: " << mp.weight << "\n";

        // Now copy all marekt price data to output stream
        if (os) std::copy(mp.marketPriceData.begin(), mp.marketPriceData.end(), std::ostream_iterator<ItemAndPrice>(os, "\n"));

        return os;
    }
};

// For this example I do not use argv and argc and file streams. 
// This, because on Stackoverflow, I do not have files on Stackoverflow
// So, I put the file data in an istringstream. But for the below example, 
// there is no difference between a file stream or a string stream

std::istringstream sourceFile{R"(2 300
abc12 130
bcd22 456
3 400
abfg12 230
bcpd22 46
abfrg2 13)"};


int main() {

    // Here we will store all the resulting data
    // So, read the complete source file, parse the data and store result in vector
    std::vector mp(std::istream_iterator<MarketPrice>(sourceFile), {});

    // Now, all data are in mp. You may work with that now

    // Show result on display
    std::copy(mp.begin(), mp.end(), std::ostream_iterator<MarketPrice>(std::cout, "\n"));

    return 0;
}
A M
  • 14,694
  • 5
  • 19
  • 44
  • this was so thorough, thank you so much. I implemented the solution for the problem using a traditional approach using isstream and then loop but I am going to try this out. This seems to be a very optimized solution to the problem ... – Learner Feb 17 '20 at 19:46