.Net does not yet offer a standard library to read CSV.
Although the CSV specification is relatively simple,
parsing a csv with multi-line data is not exactly trivial.
Some people "cheat" with Regular Expression,
but then you need to read the whole file into string,
since the regex cannot pull in more lines on demand,
and you still need to detect and handle row breaks.
And that is before we measure its performance, conformant, or your new problem.
The standard recommendation is to use a well-tested parser package.
CsvHelper is pretty comprehensive, and I would suggest NReco.Csv if you just want to read raw data.
With that said, sometimes you may not prefer a package or have restricted options.
Whatever the reason, I have written a csv parser in a few static methods that you can copy and paste into your project and get going.
Usage:
using ( var r = new StreamReader( filePath, Encoding.UTF8, true ) ) {
while ( r.TryReadCsvRow( out var row ) ) {
foreach ( string cell in row ) {
// Your code here.
}
}
}
using ( var r = new StringReader( csvString ) ) {
while ( r.TryReadCsvRow( out var row ) ) {
string[] cells = row.ToArray();
// `cells` is reusable and random-accessible
}
}
Parser Code:
/**
* <summary>Try read a csv row from a Reader. May consume multiple lines. Linebreaks in cells will become \n</summary>
* <param name="source">Reader to get line data from.</param>
* <param name="row">Cell data enumeration (forward-only), or null if no more rows.</param>
* <param name="quoteBuffer">Thread-local buffer for quote parsing. If null, one will be created on demand.</param>
* <returns>True on success, false on no more rows.</returns>
* <see cref="StreamReader.ReadLine"/>
*/
public static bool TryReadCsvRow ( this TextReader source, out IEnumerable<string> row, StringBuilder quoteBuffer = null ) {
row = ReadCsvRow( source, quoteBuffer );
return row != null;
}
/**
* <summary>Read a csv row from a Reader. May consume multiple lines. Linebreaks in cells will become \n</summary>
* <param name="source">Reader to get line data from.</param>
* <param name="quoteBuffer">Thread-local buffer for quote parsing. If null, one will be created on demand.</param>
* <returns>Cell data enumeration (forward-only), or null if no more rows.</returns>
* <see cref="StreamReader.ReadLine"/>
*/
public static IEnumerable<string> ReadCsvRow ( this TextReader source, StringBuilder quoteBuffer = null ) {
var line = source.ReadLine();
if ( line == null ) return null;
return ReadCsvCells( source, line, quoteBuffer );
}
private static IEnumerable<string> ReadCsvCells ( TextReader source, string line, StringBuilder buf ) {
for ( var pos = 0 ; line?.Length >= pos ; )
yield return ReadCsvCell( source, ref line, ref pos, ref buf );
}
private static string ReadCsvCell ( TextReader source, ref string line, ref int pos, ref StringBuilder buf ) {
var len = line.Length;
if ( pos >= len ) { // EOL
pos = len + 1;
return "";
}
// Unquoted cell.
if ( line[ pos ] != '"' ) {
var end = line.IndexOf( ',', pos );
var head = pos;
// Last cell in this row.
if ( end < 0 ) {
pos = len + 1;
return line.Substring( head );
}
// Empty cell.
if ( end == pos ) {
pos++;
return "";
}
pos = end + 1;
return line.Substring( head, end - head );
}
// Quoted cell.
if ( buf == null )
buf = new StringBuilder();
else
buf.Clear();
var start = ++pos; // Drop opening quote.
while ( true ) {
var end = pos < len
? line.IndexOf( '"', pos )
: -1;
var next = end + 1;
// End of line. Append and read next line.
if ( end < 0 ) {
buf.Append( line, start, len - start );
if ( ( line = source.ReadLine() ) == null )
return buf.ToString();
buf.Append( '\n' );
start = pos = 0; len = line.Length;
// End of cell.
} else if ( next == len || line[ next ] == ',' ) {
pos = end + 2;
return buf.Append( line, start, end - start ).ToString();
// Two double quotes.
} else if ( line[ next ] == '"' ) {
buf.Append( line, start, end - start + 1 );
pos = start = end + 2;
// One double quote not followed by EOL or comma.
} else
pos++;
}
}
Pros
- Low overhead, works with big files. (e.g. 800mb census data)
- Works with all line breaks, parse quoted cells.
- Optional buffer to improve quote parsing speed.
- Thread safe, if buffer is not shared between threads. No locking.
- No dependency. No Nuget. Works in all modern .Net.
Cons
- All line breaks will be converted to \n.
- Output is forward-only / use-once. Solve with
ToArray
or ToList
.
- Buffer, if provided, is not cleared after read.