So, let us see, how to tackle this problem.
First, we will analyze the requirements, the What
- We have comma separated data
- The source data is stored in a file
- All data shall be read from the file
- The data from one line shall be split into parts
- All parts belonging to a Person shall be kept together somehow
- We want to read as many Person data from the file, as there are available
Next, the How. How could we do that. Please note, there are many possible solutions.
First, we should note that one line of the source file contains structured data with 7 attributes.
In C++ structured data will be stored in struct
s or class
es. So, we will create a struct
and name it "Person". It will contain all attributes for a "Person". For example:
struct Person {
std::string name{};
std::string firstName{};
std::string dateOfBirth{};
std::string dateOfDeath{};
std::string company{};
std::string mainProduct{};
std::string assets{};
};
Next, we see that we want to store many Persons somehow. We do not know in advance how many. Similar data will be stored in C++ in so called containers. There are many containers defined in the standard C++ library. Becuase the size of this COnatiner is not fixed from the beginning, it needs to grow dynamically. But else, it should behave like an old C-style array. The best fitting choice is a std::vector
. Please look [here][1] for a description.
So, we will define a std::vector<Person>
to hold all persons listed in the source file.
Now, as C++ is an object oriented language, we will use objects. Objects hold data and methods, operating on that data in one entity. Objects in c++ are struct
s or class
es. The data we have defined already. Now the methods. The only thing that we want to do, is reading the data from a stream. And for this, we can use iostream
facilities. You know the inserter <<
and extractor >>
operators. If, we want to use the same mechanism for our struct, we will add a method, and her specifically an extractor >>
operator to our struct
.
And because we may need later some output, we will also add an inserter method. This may then look like this:
struct Person {
std::string name{};
std::string firstName{};
std::string dateOfBirth{};
std::string dateOfDeath{};
std::string company{};
std::string mainProduct{};
std::string assets{};
// Overwrite extractor operator
friend std::istream& operator >> (std::istream& is, Person& p) {
// Read all data from stream
std::getline(is, p.name, ',');
std::getline(is, p.firstName, ',');
std::getline(is, p.dateOfBirth, ',');
std::getline(is, p.dateOfDeath, ',');
std::getline(is, p.company, ',');
std::getline(is, p.mainProduct, ',');
std::getline(is, p.assets);
return is;
}
// Overwrite inserter operator
friend std::ostream& operator << (std::ostream& os, const Person& p) {
return os << p.name << ' ' << p.firstName << ' ' << p.dateOfBirth << ' ' << p.dateOfDeath << ' '
<< p.company << ' ' << p.mainProduct << ' ' << p.assets;;
}
};
The struct knows, how many members it has. And it will extract each attribute wit [std::getline][2]. Please note: there are many many possible solutions to split a string into parts. Often you will see a combination of std::getline
, std::istringstream
and then std::getline
working on the std::istringstream
.
This would then look like this:
friend std::istream& operator >> (std::istream& is, Person& p) {
if (std::string line{}; std::getline(is, line)) {
std::istringstream iss{ line };
// Read all data from stream
std::getline(iss, p.name, ',');
std::getline(iss, p.firstName, ',');
std::getline(iss, p.dateOfBirth, ',');
std::getline(iss, p.dateOfDeath, ',');
std::getline(iss, p.company, ',');
std::getline(iss, p.mainProduct, ',');
std::getline(iss, p.assets);
}
return is;
}
This is usually the better way. We read always a complete line and then split it. This will reduce problems, if there is an error in one line. Then the rest will continue ok. And, we can exchange the internal splitting mechanism without affecting the outside world.
OK, good, now to main
.
We need to open the file. Here we can use the functionality of the std::ifstream
. The constructor will open the file for us (and the destructor will close it automatically).
We embed this in an [if][3] statement with initialzer, so that we can check the status of the file in one step.
To read the file we use the range contructor of the std::vector
. Please see here, [constructor number 5][4].
As the iterator we use the std::istream_iterator
, decribed [here][5]. It will automatically call the extractor of the Person-class until the file is at its end. And will copy the data to the std::vector
.
And that's it.
With a one liner, we can do everything.
As mentioned above there are many ways to split a string.
Splitting a string into tokens is a very old task. There are many many solutions available. All have different properties. Some are difficult to understand, some are hard to develop, some are more complex, slower or faster or more flexible or not.
Alternatives
- Handcrafted, many variants, using pointers or iterators, maybe hard to develop and error prone.
- Using old style
std::strtok
function. Maybe unsafe. Maybe should not be used any longer
std::getline
. Most used implementation. But actually a "misuse" and not so flexible
- Using dedicated modern function, specifically developed for this purpose, most flexible and good fitting into the STL environment and algortithm landscape. But slower.
Please see 4 examples in one piece of code.
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <regex>
#include <algorithm>
#include <iterator>
#include <cstring>
#include <forward_list>
#include <deque>
using Container = std::vector<std::string>;
std::regex delimiter{ "," };
int main() {
// Some function to print the contents of an STL container
auto print = [](const auto& container) -> void { std::copy(container.begin(), container.end(),
std::ostream_iterator<std::decay<decltype(*container.begin())>::type>(std::cout, " ")); std::cout << '\n'; };
// Example 1: Handcrafted -------------------------------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Search for comma, then take the part and add to the result
for (size_t i{ 0U }, startpos{ 0U }; i <= stringToSplit.size(); ++i) {
// So, if there is a comma or the end of the string
if ((stringToSplit[i] == ',') || (i == (stringToSplit.size()))) {
// Copy substring
c.push_back(stringToSplit.substr(startpos, i - startpos));
startpos = i + 1;
}
}
print(c);
}
// Example 2: Using very old strtok function ----------------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Split string into parts in a simple for loop
#pragma warning(suppress : 4996)
for (char* token = std::strtok(const_cast<char*>(stringToSplit.data()), ","); token != nullptr; token = std::strtok(nullptr, ",")) {
c.push_back(token);
}
print(c);
}
// Example 3: Very often used std::getline with additional istringstream ------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c{};
// Put string in an std::istringstream
std::istringstream iss{ stringToSplit };
// Extract string parts in simple for loop
for (std::string part{}; std::getline(iss, part, ','); c.push_back(part))
;
print(c);
}
// Example 4: Most flexible iterator solution ------------------------------------------------
{
// Our string that we want to split
std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
Container c(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {});
//
// Everything done already with range constructor. No additional code needed.
//
print(c);
// Works also with other containers in the same way
std::forward_list<std::string> c2(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {});
print(c2);
// And works with algorithms
std::deque<std::string> c3{};
std::copy(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {}, std::back_inserter(c3));
print(c3);
}
return 0;
}
[1]: https://en.cppreference.com/w/cpp/container/vector
[2]: https://en.cppreference.com/w/cpp/string/basic_string/getline
[3]: https://en.cppreference.com/w/cpp/language/if
[4]: https://en.cppreference.com/w/cpp/container/vector/vector
[5]: https://en.cppreference.com/w/cpp/iterator/istream_iterator