C++11 makes it exceedingly easy to handle even escaped commas using regex_token_iterator:
std::stringstream ss(sText);
std::string item;
const regex re{"((?:[^\\\\,]|\\\\.)*?)(?:,|$)"};
std::getline(ss, item)
m_vecFields.insert(m_vecFields.end(), sregex_token_iterator(item.begin(), item.end(), re, 1), sregex_token_iterator());
Incidentally if you simply wanted to construct a vector<string>
from a CSV string
such as item
you could just do:
const regex re{"((?:[^\\\\,]|\\\\.)*?)(?:,|$)"};
vector<string> m_vecFields{sregex_token_iterator(item.begin(), item.end(), re, 1), sregex_token_iterator()};
[Live Example]
Some quick explanation of the regex
is probably in order. (?:[^\\\\,]|\\\\.)
matches escaped characters or non-','
characters. (See here for more info: https://stackoverflow.com/a/7902016/2642059) The *?
means that it is not a greedy match, so it will stop at the first ','
reached. All that's nested in a capture, which is selected by the last parameter, the 1
, to regex_token_iterator
. Finally, (?:,|$)
will match either the ','
-delimiter or the end of the string
.
To make this standard CSV reader ignore empty elements, the regex can be altered to only match strings with more than one character.
const regex re{"((?:[^\\\\,]|\\\\.)+?)(?:,|$)"};
Notice the '+'
has now replaced the '*'
indicating 1 or more matching characters are required. This will prevent it from matching your item
string that ends with a ','
. You can see an example of this here: http://ideone.com/W4n44W