5

I have a function that will read a CSV file line by line. For each line, it will split the line into a vector. The code to do this is

    std::stringstream ss(sText);
    std::string item;

    while(std::getline(ss, item, ','))
    {
        m_vecFields.push_back(item);
    }

This works fine except for if it reads a line where the last value is blank. For example,

text1,tex2,

I would want this to return a vector of size 3 where the third value is just empty. However, instead it just returns a vector of size 2. How can I correct this?

Jonnster
  • 3,094
  • 5
  • 33
  • 45
  • 2
    possible duplicate of [CSV parser in C++](http://stackoverflow.com/questions/1120140/csv-parser-in-c) – Tony Delroy Jul 03 '12 at 12:37
  • 2
    no it's not. That code does exactly the same thing if the line ends with a comma – Jonnster Jul 03 '12 at 12:53
  • 1
    Isn't the problem the delimiter? `std::getline` extracts until the delimiter is found. But for the last item, there is no next delimiter `,` so nothing is extracted and thus the while loop ends. – AquilaRapax Jul 03 '12 at 13:00
  • @Jonnster: just because the currently accepted answer has flaws doesn't mean the other question doesn't address the same problem space adequately - there are other answers that should work and can be upvoted, and you can comment about problems with specific answers. – Tony Delroy Jul 03 '12 at 13:26

5 Answers5

4

You could just use boost::split to do all this for you.
http://www.boost.org/doc/libs/1_50_0/doc/html/string_algo/usage.html#id3207193

It has the behaviour that you require in one line.

Example boost::split Code

#include <iostream>
#include <vector>
#include <boost/algorithm/string.hpp>

using namespace std;

int main()
{
    vector<string> strs;

    boost::split(strs, "please split,this,csv,,line,", boost::is_any_of(","));

    for ( vector<string>::iterator it = strs.begin(); it < strs.end(); it++ )
        cout << "\"" << *it << "\"" << endl;

    return 0;
}

Results

"please split"
"this"
"csv"
""
"line"
""
GrahamS
  • 9,980
  • 9
  • 49
  • 63
  • 1
    Why? `using namespace std` is a pretty normal thing to do, isn't it? I don't think removing it would achieve anything @Seanny123, but feel free to do so in your own code. – GrahamS Apr 03 '14 at 14:08
2
bool addEmptyLine = sText.back() == ',';

/* your code here */

if (addEmptyLine) m_vecFields.push_back("");

or

sText += ',';     // text1, text2,,

/* your code */

assert(m_vecFields.size() == 3);
jrok
  • 54,456
  • 9
  • 109
  • 141
  • Yes I just added code like the first example and came back to find someone had suggested it. It's not the cleanest code but it'll do. – Jonnster Jul 03 '12 at 14:03
2

You can use a function similar to this:

template <class InIt, class OutIt>
void Split(InIt begin, InIt end, OutIt splits)
{
    InIt current = begin;
    while (begin != end)
    {
        if (*begin == ',')
        {
            *splits++ = std::string(current,begin);
            current = ++begin;
        }
        else
            ++begin;
    }
    *splits++ = std::string(current,begin);
}

It will iterate through the string and whenever it encounters the delimiter, it will extract the string and store it in the splits iterator.
The interesting part is

  • when current == begin it will insert an empty string (test case: "text1,,tex2")
  • the last insertion guarantees there will always be the correct number of elements.
    If there is a trailing comma, it will trigger the previous bullet point and add an empty string, otherwise it will add the last element to the vector.

You can use it like this:

std::stringstream ss(sText);
std::string item;
std::vector<std::string> m_vecFields;
while(std::getline(ss, item))
{
    Split(item.begin(), item.end(), std::back_inserter(m_vecFields));
}

std::for_each(m_vecFields.begin(), m_vecFields.end(), [](std::string& value)
{
    std::cout << value << std::endl;
});
Julien Lebot
  • 3,092
  • 20
  • 32
2

C++11 makes it exceedingly easy to handle even escaped commas using regex_token_iterator:

std::stringstream ss(sText);
std::string item;
const regex re{"((?:[^\\\\,]|\\\\.)*?)(?:,|$)"};

std::getline(ss, item)

m_vecFields.insert(m_vecFields.end(), sregex_token_iterator(item.begin(), item.end(), re, 1), sregex_token_iterator());

Incidentally if you simply wanted to construct a vector<string> from a CSV string such as item you could just do:

const regex re{"((?:[^\\\\,]|\\\\.)*?)(?:,|$)"};
vector<string> m_vecFields{sregex_token_iterator(item.begin(), item.end(), re, 1), sregex_token_iterator()};

[Live Example]

Some quick explanation of the regex is probably in order. (?:[^\\\\,]|\\\\.) matches escaped characters or non-',' characters. (See here for more info: https://stackoverflow.com/a/7902016/2642059) The *? means that it is not a greedy match, so it will stop at the first ',' reached. All that's nested in a capture, which is selected by the last parameter, the 1, to regex_token_iterator. Finally, (?:,|$) will match either the ','-delimiter or the end of the string.

To make this standard CSV reader ignore empty elements, the regex can be altered to only match strings with more than one character.

const regex re{"((?:[^\\\\,]|\\\\.)+?)(?:,|$)"};

Notice the '+' has now replaced the '*' indicating 1 or more matching characters are required. This will prevent it from matching your item string that ends with a ','. You can see an example of this here: http://ideone.com/W4n44W

Community
  • 1
  • 1
Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
2

Flexible solution for parsing csv files: where:

source - content of CSV file

delimeter - CSV delimeter eg. ',' ';'

std::vector<std::string> csv_split(std::string source, char delimeter) {
    std::vector<std::string> ret;
    std::string word = "";
    int start = 0;

    bool inQuote = false;
    for(int i=0; i<source.size(); ++i){
        if(inQuote == false && source[i] == '"'){
            inQuote = true;
            continue;
        }
        if(inQuote == true && source[i] == '"'){
            if(source.size() > i && source[i+1] == '"'){
                ++i;
            } else {
                inQuote = false;
                continue;
            }
        }

        if(inQuote == false && source[i] == delimeter){
            ret.push_back(word);
            word = "";
        } else {
            word += source[i];
        }
    }
    ret.push_back(word);

    return ret;
}
Markoj
  • 233
  • 2
  • 7