You have 2 questions in your post:
- How do I parse this file in cpp?
- Is there a split function like in Java, so I can split and store everything as tokens?
I will answer both questions and show a demo example.
Let's start with splitting a string into tokens. There are several possibilities. We start with the easy ones.
Since the tokens in your string are delimited by a whitespace, we can take advantage of the functionality of the extractor operator (>>). This will read data from an input stream, up to a whitespace and then converts this read data into the specified variable. You know that this operation can be chained.
Then for the example string
const std::string line{ "Token1 Token2 Token3 Token4" };
you can simply put that into a std::istringstream
and then extract the variables from the stream:
std::istringstream iss1(line);
iss1 >> subString1 >> subString2 >> subString3 >> subString4;
The disadvantage is that you need to write a lot of stuff and you have to know the number of elements in the string.
We can overcome this problem with using a vector as the taget data store and fill it with its range constructor. The vectors range constructor takes a begin and and end interator and copies the data into it.
As iterator we use the std::istream_iterator
. This will, in simple terms, call the extractor operator (>>) until all data is consumed. Whatever number of data we will have.
This will then look like the below:
std::istringstream iss2(line);
std::vector token(std::istream_iterator<std::string>(iss2), {});
This may look complicated, but is not. We define a variable "token" of type std::vector
. We use its range constructor.
And, we can define the std::vector without template argument. The compiler can deduce the argument from the given function parameters. This feature is called CTAD ("class template argument deduction", C++17 required).
Additionally, you can see that I do not use the "end()"-iterator explicitely.
This iterator will be constructed from the empty brace-enclosed default initializer with the correct type, because it will be deduced to be the same as the type of the first argument due to the std::vector constructor requiring that.
There is an additional solution. It is the most powerful solution and hence maybe a little bit to complicated in the beginning.
With that can avoid the usage of std::istringstream and directly convert the string into tokens using std::sregex_token_iterator. Very simple to use. And the result is a one liner for splitting the original string:
std::vector<std::string> token2(std::sregex_token_iterator(line.begin(), line.end(), re, -1), {});
So, modern C++ has a build in functionality which is exactly designed for the purpose of tokenizing strings. It is called std::sregex_token_iterator
. What is this thing?
As it name says, it is an iterator. It will iterate over a string (hence the 's' in its name) and return the split up tokens. The tokens will be matched again a regular expression. Or, natively, the delimiter will be matched and the rest will be seen as token and returned. This will be controlled via the last flag in its constructor.
Let's have a look at this constructor:
token2(std::sregex_token_iterator(line.begin(), line.end(), re, -1), {});
The first parameter is, where it should start in the source string, the 2nd parameter is the end position, up to which the iterator should work. The last parameter is:
- 1, if you want to have a positive match for the regex
- -1, will return everything that not matches the regex
And last but not least the regex itself. Please read in the net abot regex'es. There are tons of pages available.
Please see a demo for all 3 solutions here:
#include <iostream>
#include <string>
#include <vector>
#include <regex>
#include <sstream>
#include <iterator>
#include <algorithm>
/// Split string into tokens
int main() {
// White space separated tokens in a string
const std::string line{ "Token1 Token2 Token3 Token4" };
// Solution 1: Use extractor operator ----------------------------------
// Here, we will store the result
std::string subString1{}, subString2{}, subString3{}, subString4{};
// Put the line into an istringstream for easier extraction
std::istringstream iss1(line);
iss1 >> subString1 >> subString2 >> subString3 >> subString4;
// Show result
std::cout << "\nSolution 1: Use inserter operator\n- Data: -\n" << subString1 << "\n"
<< subString2 << "\n" << subString3 << "\n" << subString4 << "\n";
// Solution 2: Use istream_iterator ----------------------------------
std::istringstream iss2(line);
std::vector token(std::istream_iterator<std::string>(iss2), {});
// Show result
std::cout << "\nSolution 2: Use istream_iterator\n- Data: -\n";
std::copy(token.begin(), token.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
// Solution 3: Use std::sregex_token_iterator ----------------------------------
const std::regex re(" ");
std::vector<std::string> token2(std::sregex_token_iterator(line.begin(), line.end(), re, -1), {});
// Show result
std::cout << "\nSolution 3: Use sregex_token_iterator\n- Data: -\n";
std::copy(token2.begin(), token2.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
return 0;
}
So, now the answer on how you could read you text file.
It is essential to create the correct data structures. Then, overwrite the inserter and extractor operator and put the above functionality in it.
Please see the below demo example. Of course there are many other possible solutions:
#include <string>
#include <iostream>
#include <sstream>
#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>
struct ItemAndPrice {
// Data
std::string item{};
unsigned int price{};
// Extractor
friend std::istream& operator >> (std::istream& is, ItemAndPrice& iap) {
// Read a complete line from the stream and check, if that worked
if (std::string line{}; std::getline(is, line)) {
// Read the item and price from that line and check, if that worked
if (std::istringstream iss(line); !(iss >> iap.item >> iap.price))
// There was an error, while reading item and price. Set failbit of input stream
is.setf(std::ios::failbit);
}
return is;
}
// Inserter
friend std::ostream& operator << (std::ostream& os, const ItemAndPrice& iap) {
// Simple output of our internal data
return os << iap.item << " " << iap.price;
}
};
struct MarketPrice {
// Data
std::vector<ItemAndPrice> marketPriceData{};
size_t numberOfElements() const { return marketPriceData.size(); }
unsigned int weight{};
// Extractor
friend std::istream& operator >> (std::istream& is, MarketPrice& mp) {
// Read a complete line from the stream and check, if that worked
if (std::string line{}; std::getline(is, line)) {
size_t numberOfEntries{};
// Read the number of following entries and the weigth from that line and check, if that worked
if (std::istringstream iss(line); (iss >> numberOfEntries >> mp.weight)) {
mp.marketPriceData.clear();
// Now copy the numberOfEntries next lines into our vector
std::copy_n(std::istream_iterator<ItemAndPrice>(is), numberOfEntries, std::back_inserter(mp.marketPriceData));
}
else {
// There was an error, while reading number of following entries and the weigth. Set failbit of input stream
is.setf(std::ios::failbit);
}
}
return is;
};
// Inserter
friend std::ostream& operator << (std::ostream& os, const MarketPrice& mp) {
// Simple output of our internal data
os << "\nNumber of Elements: " << mp.numberOfElements() << " Weight: " << mp.weight << "\n";
// Now copy all marekt price data to output stream
if (os) std::copy(mp.marketPriceData.begin(), mp.marketPriceData.end(), std::ostream_iterator<ItemAndPrice>(os, "\n"));
return os;
}
};
// For this example I do not use argv and argc and file streams.
// This, because on Stackoverflow, I do not have files on Stackoverflow
// So, I put the file data in an istringstream. But for the below example,
// there is no difference between a file stream or a string stream
std::istringstream sourceFile{R"(2 300
abc12 130
bcd22 456
3 400
abfg12 230
bcpd22 46
abfrg2 13)"};
int main() {
// Here we will store all the resulting data
// So, read the complete source file, parse the data and store result in vector
std::vector mp(std::istream_iterator<MarketPrice>(sourceFile), {});
// Now, all data are in mp. You may work with that now
// Show result on display
std::copy(mp.begin(), mp.end(), std::ostream_iterator<MarketPrice>(std::cout, "\n"));
return 0;
}