0

Using C++, I would like to split the rows of a string (CSV file in this case) where some of the fields may contain delimiters that are escaped (using "") and should be seen as literals. I have looked at the various questions already posed by have not found a direct answer to my problem.

Example of CSV file data:

Header1,Header2,Header3,Header4,Header5
Hello,",,,","world","!,,!,",","

Desired string vector after splitting:

["Hello"],[",,,"],["world"],["!,,!,"],[","]

Note: The CSV is only valid if the number of data columns equal the number of header columns.

Would prefer a non-boost / third-party solution. Efficiency is not a priority.

EDIT: Code below implementing regex from @ClasG at least satisfies the scenario above. I am drafting fringe test cases but would love to hear when / where it breaks down...

std::string s = "Hello,\",,,\",\"world\",\"!,,!,\",\",\"\"";    
std::string rx_string = "(\"[^\"]*\"|[^,]*)(?:,|$)";
regex e(rx_string);
std::regex_iterator<std::string::iterator> rit ( s.begin(), s.end(), e );
std::regex_iterator<std::string::iterator> rend;

while (rit!=rend) 
{
    std::cout << rit->str() << std::endl;
    ++rit;
}    
Willeman
  • 720
  • 10
  • 24
  • Possible duplicate of: http://stackoverflow.com/questions/1120140/how-can-i-read-and-parse-csv-files-in-c – roalz Jan 19 '17 at 10:37
  • There is no standard csv library. Why are you so keen to avoid a third-party solution? – BoBTFish Jan 19 '17 at 10:37
  • 1
    @BoBTFish Will happily consider third-party based answers. Just stating what will be ideal for this use-case. – Willeman Jan 19 '17 at 10:49
  • Use an embarassingly simple state machine to parse each char. Done. No 3rd party libaries. – Karoly Horvath Jan 19 '17 at 11:00
  • csv looks simple but can contain a lot of corner cases. You should first read what [wikipedia](https://en.wikipedia.org/wiki/Comma-separated_values) says about it. IMHO, you should first specify *exactly* what you need (separators in fields, quote char in fields, end of line in fields, etc.), build a range of test cases, and then implement it *by hand* and test your implementation. Alternatively, pick up a csv library, control if its specs meet your requirement, test it and use it. – Serge Ballesta Jan 19 '17 at 11:05

2 Answers2

1

This is not a complete (c++) solution, but a regex that might nudge you in the right direction.

A regex like

("[^"]*"|[^,]*)(?:,|$)

will match the individual columns. (Note that it doesn't handle escaped quotes.)

See it here at regex101.

SamWhan
  • 8,296
  • 1
  • 18
  • 45
  • Hmm, I won't do it, but did you test your regex on all possible csv corner cases, mainly: separators, new lines or quote chars in fields? – Serge Ballesta Jan 19 '17 at 11:08
  • I'm not sure if the supported languages' regex matches to c++ at all but +1 for the cool resource link. – Willeman Jan 19 '17 at 11:31
  • @SergeBallesta As mentioned - "Not a complete solution". It does however handle the example given, and I assume that's representative of the cases possible. – SamWhan Jan 19 '17 at 11:31
  • What flavor of c++ are you using then? 11, 14...? – SamWhan Jan 19 '17 at 11:32
  • @ClasG VS2012 so can use at least c++11. Will try a regex test app. Are you saying that the php regex at regex101 will translate directly? (Not a c++ regex expert at all but eager to learn) – Willeman Jan 19 '17 at 11:55
  • The regex I've given is very basic. I don't know of any regex flavor that don't support it. (Well, the old `CAtlRegEx` would need some tweaking of it ;) – SamWhan Jan 19 '17 at 11:59
  • @ClasG Thanks. Seems like a great start at least :) – Willeman Jan 19 '17 at 12:15
1

This is not an answer, but it's too long to put as a comment IMHO.

CSV is one of those seemingly-simple-but-actually-quite-fiendish storage formats.

The droid you're looking for is Boost.Spirit.

The Spirit Master's name (on stack overflow) is @sehe.

See his answer here: https://stackoverflow.com/a/18366335/2015579

Please credit sehe, not me.

Community
  • 1
  • 1
Richard Hodges
  • 68,278
  • 7
  • 90
  • 142