I agree that regex is not the "right" answer, but it is what the question asked for and I like a good regex challenge.
The pattern below is a modified version of my standard CSV parsing regex that removes the spaces and assumes the CSV is perfect as you requested. The only part of your question not addressed by it is that it will not remove escaped/doubled quotes. Examples for unescaping the quotes are given after the patterns.
When one or more lines/records of a CSV file/stream are passed to the regular expression below it will return a match for each non-empty line/record. Each match will contain a capture group named Value
that contains the captured values in that line/record.
Here's the commented pattern (test it on Regexstorm.net):
(?<=\r|\n|^)(?!\r|\n|$) // Records start at the beginning of line (line must not be empty)
(?: // Group for each value and a following comma or end of line (EOL) - required for quantifier (+?)
[^\S\r\n]* // Removes leading spaces
(?: // Group for matching one of the value formats before a comma or EOL
"(?<Value>(?:[^"]|"")*)"| // Quoted value -or-
(?<Value>[^,\r\n]+)| // Unquoted/open ended quoted value -or-
(?<Value>) // Empty value before comma (before EOL is excluded by "+?" quantifier later)
)
[^\S\r\n]* // Removes trailing spaces
(?:,|(?=\r|\n|$)) // The value format matched must be followed by a comma or EOL
)+? // Quantifier to match one or more values (non-greedy/as few as possible to prevent infinite empty values)
(?:(?<=,)(?<Value>))? // If the group of values above ended in a comma then add an empty value to the group of matched values
(?:\r\n|\r|\n|$) // Records end at EOL
Here's the raw pattern without all the comments or whitespace.
(?<=\r|\n|^)(?!\r|\n|$)(?:[^\S\r\n]*(?:"(?<Value>(?:[^"]|"")*)"|(?<Value>[^,\r\n]+)|(?<Value>))[^\S\r\n]*(?:,|(?=\r|\n|$)))+?(?:(?<=,)(?<Value>))?(?:\r\n|\r|\n|$)
And, here's the C# escaped version.
String CSVPattern=
@"(?<=\r|\n|^)(?!\r|\n|$)" +
@"(?:" +
@"[^\S\r\n]*" +
@"(?:" +
@"""(?<Value>(?:[^""]|"""")*)""|" +
@"(?<Value>[^,\r\n]+)|" +
@"(?<Value>)" +
@")" +
@"[^\S\r\n]*" +
@"(?:,|(?=\r|\n|$))" +
@")+?" +
@"(?:(?<=,)(?<Value>))?" +
@"(?:\r\n|\r|\n|$)";
Examples on how to use the regex pattern (well, the original pattern which can be replaced with this pattern) can be found on my answer to a similar question here, or on C# pad here, or here.
NOTE: The examples above contain the logic for unescaping/undoubling quotes as seen below:
if (Capture.Length == 0 || Capture.Index == Record.Index || Record.Value[Capture.Index - Record.Index - 1] != '\"')
{
// No need to unescape/undouble quotes if the value is empty, the value starts
// at the beginning of the record, or the character before the value is not a
// quote (not a quoted value)
Console.WriteLine(Capture.Value);
}
else
{
// The character preceding this value is a quote
// so we need to unescape/undouble any embedded quotes
Console.WriteLine(Capture.Value.Replace("\"\"", "\""));
}