See a full description below.
But first the example code (one of many possible solutions):
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <numeric>
#include <iterator>
#include <regex>
#include <map>
#include <tuple>
#include <algorithm>
#include <iomanip>
std::regex delimiter(",");
using Data = std::tuple<unsigned long, std::string, double>;
int main() {
// Open the file and check if it could be opened
if (std::ifstream csvFileStream{ "r:\\archivo.csv" }; csvFileStream) {
// Here we will store all data
std::vector<Data> data;
// Now read every line of the file until eof
for (std::string line{}; std::getline(csvFileStream, line); ) {
// Split the line into tokens
std::vector token(std::sregex_token_iterator(line.begin(), line.end(), delimiter, -1), {});
// Add to our data vector
data.emplace_back(Data{ std::stoul(token[0]), std::string(token[1]), std::stod(token[2]) });
}
// Now we want to aggregate the data. Get the sum over all
const double sum = std::accumulate(data.begin(), data.end(), 0.0, [](double v, const Data& d) { return v + std::get<2>(d); });
// Get the average over all
const double average = sum / data.size();
// Get the min and max value over all.
const auto [min, max] = std::minmax_element(data.begin(), data.end(), [](const Data& d1, const Data& d2) { return std::get<2>(d1) < std::get<2>(d2); });
// Next, we want to group based on NIT
std::map<unsigned long, double> groups{};
for (const Data& d : data) groups[std::get<0>(d)] += std::get<2>(d);
// Generate output
std::cout << "No. NIT Total Vendido\n";
unsigned int no{ 1U };
for (const auto& [NIT, gsum] : groups)
std::cout << std::right << std::setw(3) << no++ << ' ' << std::left << std::setw(9) << NIT
<< std::right << std::fixed << std::setprecision(2) << std::setw(19) << gsum << "\n";
std::cout << " ---------------\nSumatoria NITS:" << std::setw(17) << sum
<< "\nMedia NITs :" << std::setw(17) << average << "\nMin NITS :" << std::setw(17) << std::get<2>(*min)
<< "\nMax NITS :" << std::setw(17) << std::get<2>(*max) << "\nCount NITs :" << std::setw(14) << groups.size() << "\n";
}
else {
std::cerr << "\n*** Error: Could not open csv file\n";
}
return 0;
}
One of the major topics here is, how to parse a string or, it is called like this, how to split a string into tokens.
Splitting strings into tokens is a very old task. In very early C there was the function strtok
, which still exists, even in C++. Here std::strtok
.
But because of the additional functionality of std::getline
is has been heavily misused for tokenizing strings. If you look on the top question/answer regarding how to parse a CSV file (please see here), then you will see what I mean.
People are using std::getline to read a text line, a string, from the original stream, then stuffing it into an std::istringstream
and use std::getline
with delimiter again to parse the string into tokens. Weird.
But, since many many many years, we have a dedicated, special function for tokenizing strings, especially and explicitly designed for that purpose. It is the
std::sregex_token_iterator
And since we have such a dedicated function, we should simply use it.
This thing is an iterator. For iterating over a string, hence the function name is starting with an s. The begin part defines, on what range of input we shall operate, then there is a std::regex for what should be matched / or what should not be matched in the input string. The type of matching strategy is given with last parameter.
- 0 --> give me the stuff that I defined in the regex and (optional)
- -1 --> give me that what is NOT matched based on the regex.
We can use this iterator for storing the tokens in a std::vector
. The std::vector
has a range constructor, which takes 2 iterators as parameter, and copies the data between the first iterator and 2nd iterator to the std::vector. The statement
std::vector tokens(std::sregex_token_iterator(s.begin(), s.end(), re, -1), {});
defines a variable “tokens” as a std::vector and uses the so called range-constructor of the std::vector. Please note: I am using C++17 and can define the std::vector
without template argument. The compiler can deduce the argument from the given function parameters. This feature is called CTAD ("class template argument deduction").
Additionally, you can see that I do not use the "end()"-iterator explicitly.
This iterator will be constructed from the empty brace-enclosed default initializer list with the correct type, because it will be deduced to be the same as the type of the first argument due to the std::vector
constructor requiring that.
You can read any number of tokens in a line and put it into the std::vector
But you can do even more. You can validate your input. If you use 0 as last parameter, you define a std::regex
that even validates your input. And you get only valid tokens.
Additionally, it helps you to avoid the error that you made, with the last getline statement.
Overall, the usage of a dedicated functionality is superior over the misused std::getline
and people should simple use it.
Some people may complain about the function overhead, but how many of them are using big data. And even then, the approach would be probably then to use string.find
and string.substring
or std::stringviews
or whatever.
Now we should have gotten a basic understanding, how to split a string into tokens.
Next, we will explore the rest os the software.
At the beginning we open a file and check, if it has been open. We use the new existing if statement, where you can put an initializer and the condition in the (). So, we define a variable std::ifstream
an use its constructor to open the file. That was the initializer. Then we put the stream as condition as the 2nd part of the if-statement. This will check, if the file could be opened or not. That works, because the std::ifstream
s !-operator is overwritten and will return a boolean state of the stream.
OK, now the file is open. With a normal for
-statement, we read all lines of the file, using std::getline
.
Then we tokenize the line (the string). Our data per line (csv) consists of 3 values. An unsigned long
, a std::string
and a double
. We define a Type "Data" to be a tuple of those types.
The tokens for each line will be converted and put into the std::tuple
via in-place construction and the tuple will then be added to our target vector.
So, basically we need just 3 lines of code, to read and parse the complete source csv-file.
Good. Now we have all data in a std::vector
"data".
We can use existing functions from the algorithm library for getting the sum, average, min and max value.
Since we want to group the data based on the NIT, we then create an associative container: std::map. The key is the NIT and the value is the sum of the doubles. With the std::map
index operator [] we can access or create a new key. Meaning, when a NIT is not existing in the map, then it will be added. In any case, the index operator [] will return a reference to the value. And we simply add the double to the value of the map. This we do for all tuples in the data-vector.
After this, all group sums exist, and the number of keys in the map, the size()
of the std::map
is the number of groups.
The rest is just simple formatiing and output.