CSV file handling in c ++

Question

My name is Jose. I need help with a project. I need to handle .csv files in C++. The file contains nit, date and amount spent. The program must accumulate the purchase totals by NIT and must print on screen:

Sum NITs:
Average NITs
Min NITs
Max NITs
Count NITs

This following are links tot he csv files with nit, date, and total spent

enter image description here

I am trying to create output similar to:

enter image description here

My current codes is:

#include<iostream>
#include<fstream>
#include<string.h>
#include<stdlib.h>
#include<vector>
#include<sstream>

using namespace std;

void mostrar_csv();

int main()
{

    mostrar_csv();

    system("pause");
    return 0;
}

void mostrar_csv()
{
    ifstream archivo("archivo.csv");
    string linea = "";
    string escritura = "";
    vector<string> vect;

    while (getline(archivo, linea))
    {
        stringstream dato(linea);

        while (getline(dato, escritura, ';'))
        {
            vect.push_back(escritura);
        }

    }

    for (int i = 0; i < vect.size(); i++)
    { // EL .size literalmente es un metodo, es el tamaño que tiene el vector

        cout << i + 1 << ".-- " << vect.at(i) << "\n";

    }
    cout << "\n\n";
    cout << "the size is " << " " << vect.size() << " \n\n ";
}

I recommend editing the question to expand on what the problem you are having is. At the moment this looks like a blanket "Help me!" and those wind up broad and the resulting answer is covering too many bases to be useful to other programmers. — user4581301, Mar 25 '20 at 18:59
Search the internet for "C++ read file CSV". There are already a plethora of examples you can modify for your special input file. — Thomas Matthews, Mar 25 '20 at 18:59

A M · Answer 1 · 2020-03-27T07:04:04.090

See a full description below.

But first the example code (one of many possible solutions):

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <numeric>
#include <iterator>
#include <regex>
#include <map>
#include <tuple>
#include <algorithm>
#include <iomanip>

std::regex delimiter(",");
using Data = std::tuple<unsigned long, std::string, double>;

int main() {
    // Open the file and check if it could be opened
    if (std::ifstream csvFileStream{ "r:\\archivo.csv" }; csvFileStream) {

        // Here we will store all data
        std::vector<Data> data;

        // Now read every line of the file until eof
        for (std::string line{}; std::getline(csvFileStream, line); ) {

            // Split the line into tokens
            std::vector token(std::sregex_token_iterator(line.begin(), line.end(), delimiter, -1), {});

            // Add to our data vector
            data.emplace_back(Data{ std::stoul(token[0]), std::string(token[1]), std::stod(token[2]) });
        }

        // Now we want to aggregate the data. Get the sum over all
        const double sum = std::accumulate(data.begin(), data.end(), 0.0, [](double v, const Data& d) { return v + std::get<2>(d); });
        // Get the average over all
        const double average = sum / data.size();
        // Get the min and max value over all.
        const auto [min, max] = std::minmax_element(data.begin(), data.end(), [](const Data& d1, const Data& d2) { return std::get<2>(d1) < std::get<2>(d2); });

        // Next, we want to group based on NIT
        std::map<unsigned long, double> groups{};
        for (const Data& d : data) groups[std::get<0>(d)] += std::get<2>(d);

        // Generate output
        std::cout << "No. NIT              Total Vendido\n";
        unsigned int no{ 1U };

        for (const auto& [NIT, gsum] : groups)
            std::cout << std::right << std::setw(3) << no++ << ' ' << std::left << std::setw(9) << NIT 
            << std::right << std::fixed << std::setprecision(2) << std::setw(19) << gsum << "\n";

        std::cout << "                 ---------------\nSumatoria NITS:" << std::setw(17) << sum
            << "\nMedia NITs    :" << std::setw(17) << average << "\nMin NITS      :" << std::setw(17) << std::get<2>(*min)
            << "\nMax NITS      :" << std::setw(17) << std::get<2>(*max) << "\nCount NITs    :" << std::setw(14) << groups.size() << "\n";
    }
    else {
        std::cerr << "\n*** Error: Could not open csv file\n";
    }
    return 0;
}

One of the major topics here is, how to parse a string or, it is called like this, how to split a string into tokens.

Splitting strings into tokens is a very old task. In very early C there was the function strtok, which still exists, even in C++. Here std::strtok.

But because of the additional functionality of std::getline is has been heavily misused for tokenizing strings. If you look on the top question/answer regarding how to parse a CSV file (please see here), then you will see what I mean.

People are using std::getline to read a text line, a string, from the original stream, then stuffing it into an std::istringstream and use std::getline with delimiter again to parse the string into tokens. Weird.

But, since many many many years, we have a dedicated, special function for tokenizing strings, especially and explicitly designed for that purpose. It is the

std::sregex_token_iterator

And since we have such a dedicated function, we should simply use it.

This thing is an iterator. For iterating over a string, hence the function name is starting with an s. The begin part defines, on what range of input we shall operate, then there is a std::regex for what should be matched / or what should not be matched in the input string. The type of matching strategy is given with last parameter.

0 --> give me the stuff that I defined in the regex and (optional)
-1 --> give me that what is NOT matched based on the regex.

We can use this iterator for storing the tokens in a std::vector. The std::vector has a range constructor, which takes 2 iterators as parameter, and copies the data between the first iterator and 2nd iterator to the std::vector. The statement

std::vector tokens(std::sregex_token_iterator(s.begin(), s.end(), re, -1), {});

defines a variable “tokens” as a std::vector and uses the so called range-constructor of the std::vector. Please note: I am using C++17 and can define the std::vector without template argument. The compiler can deduce the argument from the given function parameters. This feature is called CTAD ("class template argument deduction").

Additionally, you can see that I do not use the "end()"-iterator explicitly.

This iterator will be constructed from the empty brace-enclosed default initializer list with the correct type, because it will be deduced to be the same as the type of the first argument due to the std::vector constructor requiring that.

You can read any number of tokens in a line and put it into the std::vector

But you can do even more. You can validate your input. If you use 0 as last parameter, you define a std::regex that even validates your input. And you get only valid tokens.

Additionally, it helps you to avoid the error that you made, with the last getline statement.

Overall, the usage of a dedicated functionality is superior over the misused std::getline and people should simple use it.

Some people may complain about the function overhead, but how many of them are using big data. And even then, the approach would be probably then to use string.findand string.substring or std::stringviews or whatever.

Now we should have gotten a basic understanding, how to split a string into tokens.

Next, we will explore the rest os the software.

At the beginning we open a file and check, if it has been open. We use the new existing if statement, where you can put an initializer and the condition in the (). So, we define a variable std::ifstream an use its constructor to open the file. That was the initializer. Then we put the stream as condition as the 2nd part of the if-statement. This will check, if the file could be opened or not. That works, because the std::ifstreams !-operator is overwritten and will return a boolean state of the stream.

OK, now the file is open. With a normal for-statement, we read all lines of the file, using std::getline.

Then we tokenize the line (the string). Our data per line (csv) consists of 3 values. An unsigned long, a std::string and a double. We define a Type "Data" to be a tuple of those types.

The tokens for each line will be converted and put into the std::tuple via in-place construction and the tuple will then be added to our target vector.

So, basically we need just 3 lines of code, to read and parse the complete source csv-file.

Good. Now we have all data in a std::vector "data".

We can use existing functions from the algorithm library for getting the sum, average, min and max value.

Since we want to group the data based on the NIT, we then create an associative container: std::map. The key is the NIT and the value is the sum of the doubles. With the std::map index operator [] we can access or create a new key. Meaning, when a NIT is not existing in the map, then it will be added. In any case, the index operator [] will return a reference to the value. And we simply add the double to the value of the map. This we do for all tuples in the data-vector.

After this, all group sums exist, and the number of keys in the map, the size() of the std::map is the number of groups.

The rest is just simple formatiing and output.

thank you very much Armin Montigny I had no idea how to do the program You have helped me a lot the way you explain is perfect i understood very well Your programming is very advanced, I still have a long way to go to that level of programming, I'm just starting — Jose Morales, Mar 27 '20 at 23:10

CSV file handling in c ++

1 Answers1