0

I hope I am asking correctly, but I'm going to try. I have a text file that I want to open and put each row separate into a vector and then do getline again with a delimiter of ',' to have a visual of it being the "columns" because the file is a .csv file from excel. the columns are fixed, but the number of rows will vary from file to file. When I do the first getline with the '\n' delimiter and cout it, I get the rows separated. I figured if I did it again with a ',' it would represent the "columns" so that it is separated in the vector it was stored instead of it being one large string.

I tried doing getline again and for cout, I put "pizza" just to see where it is parsing, but it only does it for the second line, 1st word, 3rd line 1st word till the end of the file. I tried a vector of vectors of type string and use push_back but then I got confused. I think my issue is my order of getline or how I cout.

Sample input:

0, 6/19/2019, 16:41:33,33.972622,-117.323482,24.25,23.5,23.25,24.75,25.5,24.25,25.25,25.5,24.5,24,24,24.25,25.5,25.75,25.25,25,24.5,24.75,24.75,24.75,25.25,24.5,24.5,25.5,23.75,24.25,24.75,24,24.25,24,24.5,25,24.25,24,24.25,24.25,24,24.25,24.5,25.5,24,25,24.5,24.75,24.5,24.75,24.75,25.5,24.5,24.25,24.25,25.25,25.25,23.5,25,24.75,24.5,24.75,25.5,24.25,23.5,24,25.25,25,605,597,515,514,509,511,508
0, 6/19/2019, 16:41:42,33.972648,-117.323492,24,23.5,23.75,24.25,25.5,25.5,25.25,25.25,25,24.5,24.25,24.5,25,25.5,25.5,25.75,24.25,23.5,24.75,24.5,24.25,24.25,24.5,25.5,24,23.75,24.5,24,24.25,24,24.75,25.25,25,23.75,24.75,25.5,25.5,26,24.75,25.25,24.5,25,25.25,25.25,26,24.75,24.5,25.5,24.5,24.5,25,24.75,24.25,24.25,25,25,24,24,24.75,25,23.25,24.25,25.5,25.5,609,595,1229,1227,1200,1196,1171
0, 6/19/2019, 16:41:49,33.972643,-117.323479,24.5,23,22.75,24,25.25,25.5,25,26,24.75,24,24,24.75,24.75,25.25,25.5,26,24.75,24,24.75,25,24.25,24.25,24.75,26,24.5,23.5,24.5,24,24,24,25,25.75,24.75,23.25,24.5,24.5,24.5,25,25.25,25.25,24,25,24.5,25.25,25.25,25.25,25.25,25.5,24.5,24,25.25,25,25,24.25,25,25.25,24.25,24,24.75,25.25,23.75,24.25,25,25.5,621,601,706,725,703,707,704
1, 6/19/2019, 16:41:55,33.972631,-117.323483,24.25,23.75,23.25,24,25.25,25.25,25.5,26,24.5,24.25,23.75,24.5,24.75,25.5,26,25.5,25,23.75,24.75,24.75,25.25,25.25,25,26.25,24.5,23.5,24.25,25,24.25,24.25,24.75,25.75,24.75,23.75,24.25,24.25,24.25,24.5,25.25,25.25,24.5,24.5,24.75,25,25.25,26,25.5,25.25,24.5,24,24.75,25,25,25.25,25.5,25.5,24.25,25,25,25.75,24.25,24.5,25.25,25.5,613,602,721,720,699,704,696 <br/>

code:

string word;
ifstream excel;
excel.open("test.csv");

while(!excel.eof()) {
    getline(excel,word,'\n');  // double endl to see parsing
    cout << word << endl << endl; //makes "rows"
    getline(excel,word,',');  //set delimiter
    cout << "pizza" << word << endl; //make columns??
}

I expect the output to be separated by line, which it does, but each line is then separated by pizza, the string and the end of line.

Expected ouput:

pizza25

pizza24.25

pizza25.25 etc....

Current output is:

25,24.25,25.25,25.25,23.5,25,24.75,24.5,24.75,25.5,24.25,23.5,24,25.25,25,605,597,515,514,509,511,508

pizza0
 6/19/2019, 16:41:42,33.972648,-117.323492,24,23.5,23.75,24.25,25.5,25.5,25.25,25.25,25,24.5,24.25,24.5,25,25.5,25.5,25.75,24.25,23.5,24.75,24.5,24.
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Teddy
  • 11
  • 4
  • Welcome to Stack Overflow. Please try to simplify your question. If you're trying to parse a line of text using `getline` with a chosen delimiter, concentrate on that. Use the simplest line you can come up with that presents the problem you're trying to solve. Show us your code, *the input line,* the desired output and the actual output. – Beta Aug 01 '19 at 19:09
  • Welcome to Stack Overflow, Teddy. Can you please provide a sample input (very small)? – MTMD Aug 01 '19 at 19:17
  • 3
    Possible duplicate of [How can I read and parse CSV files in C++?](https://stackoverflow.com/questions/1120140/how-can-i-read-and-parse-csv-files-in-c) – Michael Aug 01 '19 at 19:29
  • What does the input look like? – Martin York Aug 01 '19 at 19:34
  • 2
    Unrelated: Recommended reading: [Why is iostream::eof inside a loop condition (i.e. `while (!stream.eof())`) considered wrong?](https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-i-e-while-stream-eof-cons) – user4581301 Aug 01 '19 at 19:34
  • I have added a small portion of my input – Teddy Aug 01 '19 at 19:46
  • @user4581301 thank you! I will fix in my program and on here – Teddy Aug 01 '19 at 19:51
  • Think on what you've asked for. Every iteration of the loop you read (`getline(excel,word,'\n');`) and print (`cout << word << endl << endl;`) the remainder of the line before reading one comma separated token (`getline(excel,word,',');`) and printing it (`cout << "pizza" << word << endl;`). Based on your desired output you need to start by removing the first `getline` and `cout`. Then you need to figure out how to ignore the timestamp (0, 6/19/2019, 16:41:33,33.972622,) and any other undesirable input. – user4581301 Aug 01 '19 at 19:58

2 Answers2

0

I highly recommend modelling each row as a class or struct, and overloading operator>> to read in the record.

struct Data_Row
{
    friend std::istream& operator(std::istream& input, Data_Row& dr);
    // members follow
};

std::istream& operator(std::istream& input, Data_Row& dr)
{
    char comma;
    input >> dr.column1_value;
    input >> comma;
    input >> dr.column2_value;
    input >> comma;
    //...
    return input;
}

Your input code could look like:

std::vector<Data_Row> database;
Data_Row dr;
while (excel_file >> dr)
{
    database.push_back(dr);
}

Your input data is not of the same type, as the first (leftmost) columns are different types, so a matrix is not an ideal container.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
0

I would use a "more" C++ approach.

And still all people are linking to How can I read and parse CSV files in C++?, the questions is from 2009 and now over 10 years old. Most answers are also old and very complicated. So, maybe its time for a change.

In modern C++ you have algorithms that iterate over ranges. You will often see something like "someAlgoritm(container.begin(), container.end(), someLambda)". The idea is that we iterate over some similar elements.

In your case we iterate over tokens in your input string, and create substrings. This is called tokenizing.

And for exactly that purpose, we have the std::sregex_token_iterator. And because we have something that has been defined for such purpose, we should use it.

This thing is an iterator. For iterating over a string, hence sregex. The begin part defines, on what range of input we shall operate, then there is a regex for what should be matched / or what should not be matched in the input string. The type of matching strategy is given with last parameter. 1 --> give me the stuff that I defined in the regex and -1 --> give me that what is NOT matched based on the regex.

So, now that we understand the iterator, we can std::copy the tokens from the iterator to our target, a std::vector of std::string. And since we do not know, how may columns we have, we will use the std::back_inserter as a target. This will add all tokens that we get from the std::sregex_token_iterator and append it ot our std::vector<std::string>>. It does'nt matter how many.

Good. Such a statement could look like

std::copy(                          // We want to copy something
    std::sregex_token_iterator      // The iterator begin, the sregex_token_iterator. Give back first token
    (
        line.begin(),               // Evaluate the input string from the beginning
        line.end(),                 // to the end
        re,                         // Add match a comma
        -1                          // But give me back not the comma but everything else 
    ),
    std::sregex_token_iterator(),   // iterator end for sregex_token_iterator, last token + 1
    std::back_inserter(cp.columns)  // Append everything to the target container
);

Now we can understand, how this copy operation works.

Next step. We want to read from a file. The file consists also of some kind of same data. The same data are rows.

And as for above, we can iterate of similar data. If it is the file input or whatever. For this purpose C++ has the std::istream_iterator. This is a template and as a template parameter it gets the type of data that id should read and as a constructor parameter it gets a reference to an input stream. It doesnt't matter, if the input stream is a std::cin, or a std::ifstream or a std::istringstream. The behaviour is identical for all kinds of stream.

And this we do not have files an SO, I use a std::istringstream to store the input csv file. But of cours you can open a file, by defining a std::ifstream testCsv(filename). No problem.

And with std::istream_iterator we iterate over the input and read similar data. In our case, one problem is that we want to iterate over special data and not over some build in data type.

To solve this, we define a Proxy class, which does the internal work for us (we do not want to know how, that should be encapsulated in the proxy). In the proxy we overwrite the type cast operator, to case the result to our expected type for the std::istream_iterator.

And the last important step. A std::vector has a range constructor. It has also a lot of other constructors that we can use in the definition of a variable of type std::vector. But for our purposes this constructor fits best.

So we define a variable csv and use its range constructor and give it a begin of a range and an end of a range. And, in our specific example, we use the begin and end iterator of std::istream_iterator.

If we combine all the above, reading the complete CSV file is a one liner, it is the definition of a variable with its constructor.

Please see the resulting code:


#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <regex>
#include <algorithm>

std::istringstream testCsv{ R"(0, 6/19/2019, 16:41:33, 33.972622, -117.323482, 24.25, 23.5, 23.25, 24.75, 25.5, 24.25, 25.25, 25.5, 24.5, 24, 24, 24.25, 25.5, 25.75, 25.25, 25, 24.5, 24.75, 24.75, 24.75, 25.25, 24.5, 24.5, 25.5, 23.75, 24.25, 24.75, 24, 24.25, 24, 24.5, 25, 24.25, 24, 24.25, 24.25, 24, 24.25, 24.5, 25.5, 24, 25, 24.5, 24.75, 24.5, 24.75, 24.75, 25.5, 24.5, 24.25, 24.25, 25.25, 25.25, 23.5, 25, 24.75, 24.5, 24.75, 25.5, 24.25, 23.5, 24, 25.25, 25, 605, 597, 515, 514, 509, 511, 508
0, 6/19/ 2019, 16:41:42, 33.972648, -117.323492, 24, 23.5, 23.75, 24.25, 25.5, 25.5, 25.25, 25.25, 25, 24.5, 24.25, 24.5, 25, 25.5, 25.5, 25.75, 24.25, 23.5, 24.75, 24.5, 24.25, 24.25, 24.5, 25.5, 24, 23.75, 24.5, 24, 24.25, 24, 24.75, 25.25, 25, 23.75, 24.75, 25.5, 25.5, 26, 24.75, 25.25, 24.5, 25, 25.25, 25.25, 26, 24.75, 24.5, 25.5, 24.5, 24.5, 25, 24.75, 24.25, 24.25, 25, 25, 24, 24, 24.75, 25, 23.25, 24.25, 25.5, 25.5, 609, 595, 1229, 1227, 1200, 1196, 1171
0, 6/19/2019, 16:41:49, 33.972643, -117.323479, 24.5, 23, 22.75, 24, 25.25, 25.5, 25, 26, 24.75, 24, 24, 24.75, 24.75, 25.25, 25.5, 26, 24.75, 24, 24.75, 25, 24.25, 24.25, 24.75, 26, 24.5, 23.5, 24.5, 24, 24, 24, 25, 25.75, 24.75, 23.25, 24.5, 24.5, 24.5, 25, 25.25, 25.25, 24, 25, 24.5, 25.25, 25.25, 25.25, 25.25, 25.5, 24.5, 24, 25.25, 25, 25, 24.25, 25, 25.25, 24.25, 24, 24.75, 25.25, 23.75, 24.25, 25, 25.5, 621, 601, 706, 725, 703, 707, 704
1, 6/19/2019, 16:41:55, 33.972631, -117.323483, 24.25, 23.75, 23.25, 24, 25.25, 25.25, 25.5, 26, 24.5, 24.25, 23.75, 24.5, 24.75, 25.5, 26, 25.5, 25, 23.75, 24.75, 24.75, 25.25, 25.25, 25, 26.25, 24.5, 23.5, 24.25, 25, 24.25, 24.25, 24.75, 25.75, 24.75, 23.75, 24.25, 24.25, 24.25, 24.5, 25.25, 25.25, 24.5, 24.5, 24.75, 25, 25.25, 26, 25.5, 25.25, 24.5, 24, 24.75, 25, 25, 25.25, 25.5, 25.5, 24.25, 25, 25, 25.75, 24.25, 24.5, 25.25, 25.5, 613, 602, 721, 720, 699, 704, 696
)" };


// Define Alias for Easier Reading
using Columns = std::vector<std::string>;
using CSV = std::vector<Columns>;


// Proxy for the input Iterator
struct ColumnProxy {    
    // Overload extractor. Read a complete line
    friend std::istream& operator>>(std::istream& is, ColumnProxy& cp) {

        // Read a line
        std::string line; cp.columns.clear();
        std::getline(is, line);

        // The delimiter
        const std::regex re(",");

        // Split values and copy into resulting vector
        std::copy(std::sregex_token_iterator(line.begin(), line.end(), re, -1),
            std::sregex_token_iterator(),
            std::back_inserter(cp.columns));
        return is;
    }

    // Type cast operator overload.  Cast the type 'Columns' to std::vector<std::string>
    operator std::vector<std::string>() const { return columns; }
protected:
    // Temporary to hold the read vector
    Columns columns{};
};


int main()
{
    // Define variable CSV with its range constructor. Read complete CSV in this statement
    CSV csv{ std::istream_iterator<ColumnProxy>(testCsv), std::istream_iterator<ColumnProxy>() };

    // Print result. Go through all lines and then copy line elements to std::cout
    std::for_each(csv.begin(), csv.end(), [](Columns& c) {
        std::copy(c.begin(), c.end(), std::ostream_iterator<std::string>(std::cout, " ")); std::cout << "\n";   });
}

I hope the explanation was detailed enough to give you an idea, what you can do with modern C++

A M
  • 14,694
  • 5
  • 19
  • 44