340

Is there a default/official/recommended way to parse CSV files in C#? I don't want to roll my own parser.

Also, I've seen instances of people using ODBC/OLE DB to read CSV via the Text driver, and a lot of people discourage this due to its "drawbacks." What are these drawbacks?

Ideally, I'm looking for a way through which I can read the CSV by column name, using the first record as the header / field names. Some of the answers given are correct but work to basically deserialize the file into classes.

smci
  • 32,567
  • 20
  • 113
  • 146
David Pfeffer
  • 38,869
  • 30
  • 127
  • 202

19 Answers19

454

A CSV parser is now a part of .NET Framework.

Add a reference to Microsoft.VisualBasic.dll (works fine in C#, don't mind the name)

using (TextFieldParser parser = new TextFieldParser(@"c:\temp\test.csv"))
{
    parser.TextFieldType = FieldType.Delimited;
    parser.SetDelimiters(",");
    while (!parser.EndOfData)
    {
        //Process row
        string[] fields = parser.ReadFields();
        foreach (string field in fields)
        {
            //TODO: Process field
        }
    }
}

The docs are here - TextFieldParser Class

P.S. If you need a CSV exporter, try CsvExport (discl: I'm one of the contributors)

Jon
  • 9,156
  • 9
  • 56
  • 73
Alex from Jitbit
  • 53,710
  • 19
  • 160
  • 149
  • 4
    From my experience TextFieldParser does not perform well with large (e.g > 250Mb) files. :( – MBoros Mar 20 '14 at 12:09
  • TextFieldParser doesn't have an option to parse single quoted strings, they have to be double quotes. Would be much more useful if you could tell it what the quote identifier is. – Rush Frisby Oct 26 '15 at 18:43
  • 7
    In the constructor you might want to use a different encoding than the one by default, like so: new TextFieldParser("c:\temp\test.csv", System.Text.Encoding.UTF8) – neural5torm Apr 19 '16 at 17:41
  • 1
    Note that if any _field_ in your CSV contains blank lines, they will skipped by `TextFieldParser.ReadLine()`. See [TextFieldParser docs](https://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.readfields.aspx) – mcNux Sep 26 '16 at 09:49
  • TextFieldParser cannot process lines longer than 32,767 characters (neither can Excel). – vkelman Dec 20 '16 at 17:16
  • 2
    How do I specify if the csv file contains headers? – Aditya Apr 11 '17 at 08:04
  • @MBoros I'm wondering: what file that contains text could possibly be 250 Mb? – user3478148 Jul 14 '17 at 11:42
  • 1
    @user3478148 for example Csv exports from legacy systems. i have seen csvs of ten gigs, so believe me, it happens. Its usually an abused export method from a legacy system – MBoros Jul 15 '17 at 13:58
  • 5
    Is there a way to get this in .NET Core? –  May 02 '18 at 12:12
  • If you're going down that path, you can use [TextFieldParser](https://github.com/22222/CsvTextFieldParser) instead, so you don't have to reference all of VB just to get that one class. – jpaugh Sep 11 '18 at 16:24
  • TextFieldParser is quite slow for large files. A CSV file with over 300K records on my [very fast] PC takes over 1/2 hour to read and parse. – HerrimanCoder Oct 04 '18 at 22:13
  • 7
    This answer is not very useful to beginners without mentioning that in order to get this code to work you have to add `using Microsoft.VisualBasic.FileIO;` statement at the beginning of your file. – z33k Dec 11 '19 at 14:44
  • 1
    @HugoZink TextFieldParser has been included in .Net core 3.0 – Alex from Jitbit Feb 24 '20 at 14:45
  • 2
    Note that if your using `parser.SetDelimiters(",");` you don't need to use `parser.TextFieldType = FieldType.Delimited;` too. `SetDelimiters()` "Sets the delimiters for the reader to the specified values, **and sets the field type to Delimited**." – Oli Apr 26 '20 at 15:30
  • 1
    Why is this the most upvoted answer? It does not answer the question where it is asked for a way to get fields by their header name. – Stacksatty Jun 22 '21 at 18:33
  • 1
    @Stacksatty the purpose of Stackoverflow is not just answering OPs' questions specifically and write code for them, but to hint at the right direction and be helpful to the community in general. Getting fields by their names is trivial once you parsed the header row. – Alex from Jitbit Jun 22 '21 at 20:53
  • As I was just bit by this with a csv file with large fields containing base64 encoded data for images, TextFieldParser has a hardcoded 10,000,000 character buffer for parsing fields, and if you have a field larger than this, it dies – user1958698 Sep 16 '22 at 17:39
  • Seeing issues with UTF8 data – sobelito Feb 11 '23 at 19:38
254

CsvHelper (a library I maintain) will read a CSV file into custom objects.

using (var reader = new StreamReader("path\\to\\file.csv"))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
    var records = csv.GetRecords<Foo>();
}

Sometimes you don't own the objects you're trying to read into. In this case, you can use fluent mapping because you can't put attributes on the class.

public sealed class MyCustomObjectMap : CsvClassMap<MyCustomObject>
{
    public MyCustomObjectMap()
    {
        Map( m => m.Property1 ).Name( "Column Name" );
        Map( m => m.Property2 ).Index( 4 );
        Map( m => m.Property3 ).Ignore();
        Map( m => m.Property4 ).TypeConverter<MySpecialTypeConverter>();
    }
}
Josh Noe
  • 2,664
  • 2
  • 35
  • 37
Josh Close
  • 22,935
  • 13
  • 92
  • 140
  • 22
    I agree with @kubal5003. What sold me on it was you have it available as a NuGet package. Thanks man, it is fast, and does all the csv reading I need. – Gromer Sep 07 '12 at 15:10
  • 9
    It's damn fast. 1.3 million records read and deserialized in 10 seconds. – marisks Jan 16 '13 at 14:50
  • 3
    Great library very easy to implement. I would just suggest to Josh to update his answer here because the library has changed a bit since this answer was written and you cannot instantiate CsvHelper anymore (it's only a namespace now) but you have to use the CsvReader class. – Marko Aug 05 '13 at 20:04
  • File.OpenRead() should be File.OpenText() as CsvReader() doesn't take filestream as a constructor argument. – Paul George Oct 29 '14 at 10:46
  • 1
    CsvClassMap doesn't seem to exist in the last version of CsvHelper? – knocte Dec 10 '15 at 08:45
  • 3
    knocte, it is called ClassMap now. There are other changes too, like having to do a read before asking for the header record (which by the way gets set to whatever was read by the first call to Read()). Like others have mentioned before, it is superfast and easy to work with. – norgie Oct 20 '17 at 16:19
  • @norgie This is to allows you to read a headers on multiple rows. – Josh Close Oct 20 '17 at 16:21
  • @JoshClose I know. :-) The various changes came as a surprise when upgrading from 2.x to 3.x, that's all. :-) – norgie Oct 20 '17 at 16:25
  • I had a lot of issues with the class: classes names were changed and provided flow doesn't work smoothly. To top it all of: I got empty records. – Mr.B Sep 21 '18 at 21:16
  • @Mr.B The documentation in the link provided is up to date. – Josh Close Sep 21 '18 at 21:35
  • 1
    No. please, check your code in nuget and documentation. you need to research and to be lucky to make it work. – Mr.B Sep 21 '18 at 22:38
  • 1
    If it's not easy to use and intuitive, then that's an issue. If you want to submit an issue, I'd like to get your feedback. – Josh Close Sep 21 '18 at 22:49
  • BEST LIBRARY /PARSER I have ever used. – Jin Thakur Nov 22 '19 at 18:26
  • A very similar library is [SoftCircuits.CsvParser](https://www.nuget.org/packages/SoftCircuits.CsvParser/). It doesn't require you to load all the rows in memory at once. – Jonathan Wood Feb 11 '20 at 17:15
  • 6
    CsvHelper doesn't load all the rows into memory either. It loads a small buffer and yields results. You might also want to mention that SoftCircuits.CsvParser is maintained by you. I believe that is policy on SO now. – Josh Close Feb 11 '20 at 17:21
  • Not sure why people has upvoted this so much, when there is no way to make this library work with tab-delimited header and rows. No matter if you set the tab character with the class attribute or setting it in the context class, it just does not handle the tabulations. – ElektroStudios Jun 26 '23 at 17:36
152

Let a library handle all the nitty-gritty details for you! :-)

Check out FileHelpers and stay DRY - Don't Repeat Yourself - no need to re-invent the wheel a gazillionth time....

You basically just need to define that shape of your data - the fields in your individual line in the CSV - by means of a public class (and so well-thought out attributes like default values, replacements for NULL values and so forth), point the FileHelpers engine at a file, and bingo - you get back all the entries from that file. One simple operation - great performance!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • 1
    until you need sth really custom (and most of that can be implemented as extensions anyway) FileHelpers is by far the best way to go, really convenient, tested and well performing solution – mikus Aug 12 '13 at 13:39
  • 4
    As of 1st June 2015, the only way I could download FileHelpers was to search for it on sourceforge.net. Here's the link used: http://sourceforge.net/projects/filehelpers/?source=directory – Sudhanshu Mishra Jun 01 '15 at 05:35
  • 2
    @dotnetguy we are in the way to release 3.1 (currently 3.1-rc2) is out. Also we redesigned the site: www.filehelpers.net you can download latest version from there – Marcos Meli Jul 22 '15 at 23:24
  • 1
    @MarcosMeli many thanks! I already used FileHelpers in one of my projects and it was a breeze to use - kudos to the team. I'm planning a blog on it soon and btw - Love the new site - well done! – Sudhanshu Mishra Jul 22 '15 at 23:52
  • 3
    FileHelpers doesn't handle quoted commas in CSVs properly, or actually map field headers, expecting instead that the columns are in the same order as the fields are declared in your type. I wouldn't use it, personally. – Alastair Maw Sep 20 '19 at 11:11
33

In a business application, i use the Open Source project on codeproject.com, CSVReader.

It works well, and has good performance. There is some benchmarking on the link i provided.

A simple example, copied from the project page:

using (CsvReader csv = new CsvReader(new StreamReader("data.csv"), true))
{
    int fieldCount = csv.FieldCount;
    string[] headers = csv.GetFieldHeaders();

    while (csv.ReadNextRecord())
    {
        for (int i = 0; i < fieldCount; i++)
            Console.Write(string.Format("{0} = {1};", headers[i], csv[i]));

        Console.WriteLine();
    }
}

As you can see, it's very easy to work with.

alexn
  • 57,867
  • 14
  • 111
  • 145
21

I know its a bit late but just found a library Microsoft.VisualBasic.FileIO which has TextFieldParser class to process csv files.

Nasreddine
  • 36,610
  • 17
  • 75
  • 94
user1131926
  • 1,311
  • 2
  • 18
  • 32
13

Here is a helper class I use often, in case any one ever comes back to this thread (I wanted to share it).

I use this for the simplicity of porting it into projects ready to use:

public class CSVHelper : List<string[]>
{
  protected string csv = string.Empty;
  protected string separator = ",";

  public CSVHelper(string csv, string separator = "\",\"")
  {
    this.csv = csv;
    this.separator = separator;

    foreach (string line in Regex.Split(csv, System.Environment.NewLine).ToList().Where(s => !string.IsNullOrEmpty(s)))
    {
      string[] values = Regex.Split(line, separator);

      for (int i = 0; i < values.Length; i++)
      {
        //Trim values
        values[i] = values[i].Trim('\"');
      }

      this.Add(values);
    }
  }
}

And use it like:

public List<Person> GetPeople(string csvContent)
{
  List<Person> people = new List<Person>();
  CSVHelper csv = new CSVHelper(csvContent);
  foreach(string[] line in csv)
  {
    Person person = new Person();
    person.Name = line[0];
    person.TelephoneNo = line[1];
    people.Add(person);
  }
  return people;
}

[Updated csv helper: bug fixed where the last new line character created a new line]

Base33
  • 3,167
  • 2
  • 27
  • 31
  • 19
    if any of the csv entries contains comma (,) this code will not work. – hakan Jun 12 '12 at 14:32
  • To keep things lightweight, I used a pipe character as the seperator. '|' – Base33 Jun 12 '12 at 16:45
  • excellent solution. Just a question about the 2nd snippet. What type of object is Person – Cocoa Dev Jan 23 '13 at 14:34
  • @CocoaDev It is an class that contains two string properties - Name and TelephoneNo. Purely for the example though. If any of the properties was an integer it should be just a straight forward conversion (with check?). – Base33 Jan 23 '13 at 16:59
12

If you need only reading csv files then I recommend this library: A Fast CSV Reader
If you also need to generate csv files then use this one: FileHelpers

Both of them are free and opensource.

Marcos Meli
  • 3,468
  • 24
  • 29
Giorgi
  • 30,270
  • 13
  • 89
  • 125
  • FileHelpers has an appealing summary: http://www.filehelpers.com/ The FileHelpers are a free and easy to use .NET library to import/export data from fixed length or delimited records in files, strings or streams. – AnneTheAgile Oct 06 '12 at 00:19
  • 1
    While this link may answer the question, link only answers are discouraged on Stack Overflow, you can improve this answer by taking vital parts of the link and putting it into your answer, this makes sure your answer is still an answer if the link gets changed or removed :) – WhatsThePoint Oct 24 '18 at 07:23
12

This solution is using the official Microsoft.VisualBasic assembly to parse CSV.

Advantages:

  • delimiter escaping
  • ignores Header
  • trim spaces
  • ignore comments

Code:

    using Microsoft.VisualBasic.FileIO;

    public static List<List<string>> ParseCSV (string csv)
    {
        List<List<string>> result = new List<List<string>>();


        // To use the TextFieldParser a reference to the Microsoft.VisualBasic assembly has to be added to the project. 
        using (TextFieldParser parser = new TextFieldParser(new StringReader(csv))) 
        {
            parser.CommentTokens = new string[] { "#" };
            parser.SetDelimiters(new string[] { ";" });
            parser.HasFieldsEnclosedInQuotes = true;

            // Skip over header line.
            //parser.ReadLine();

            while (!parser.EndOfData)
            {
                var values = new List<string>();

                var readFields = parser.ReadFields();
                if (readFields != null)
                    values.AddRange(readFields);
                result.Add(values);
            }
        }

        return result;
    }
Jonas_Hess
  • 1,874
  • 1
  • 22
  • 32
8

I have written TinyCsvParser for .NET.

It is released under MIT License:

You can use NuGet to install it. Run the following command in the Package Manager Console.

PM> Install-Package TinyCsvParser

Usage

Imagine we have list of Persons in a CSV file persons.csv with their first name, last name and birthdate.

FirstName;LastName;BirthDate
Philipp;Wagner;1986/05/12
Max;Musterman;2014/01/02

The corresponding domain model in our system might look like this.

private class Person
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public DateTime BirthDate { get; set; }
}

When using TinyCsvParser you have to define the mapping between the columns in the CSV data and the property in you domain model.

private class CsvPersonMapping : CsvMapping<Person>
{

    public CsvPersonMapping()
        : base()
    {
        MapProperty(0, x => x.FirstName);
        MapProperty(1, x => x.LastName);
        MapProperty(2, x => x.BirthDate);
    }
}

And then we can use the mapping to parse the CSV data with a CsvParser.

namespace TinyCsvParser.Test
{
    [TestFixture]
    public class TinyCsvParserTest
    {
        [Test]
        public void TinyCsvTest()
        {
            CsvParserOptions csvParserOptions = new CsvParserOptions(true, new[] { ';' });
            CsvPersonMapping csvMapper = new CsvPersonMapping();
            CsvParser<Person> csvParser = new CsvParser<Person>(csvParserOptions, csvMapper);

            var result = csvParser
                .ReadFromFile(@"persons.csv", Encoding.ASCII)
                .ToList();

            Assert.AreEqual(2, result.Count);

            Assert.IsTrue(result.All(x => x.IsValid));
            
            Assert.AreEqual("Philipp", result[0].Result.FirstName);
            Assert.AreEqual("Wagner", result[0].Result.LastName);

            Assert.AreEqual(1986, result[0].Result.BirthDate.Year);
            Assert.AreEqual(5, result[0].Result.BirthDate.Month);
            Assert.AreEqual(12, result[0].Result.BirthDate.Day);

            Assert.AreEqual("Max", result[1].Result.FirstName);
            Assert.AreEqual("Mustermann", result[1].Result.LastName);

            Assert.AreEqual(2014, result[1].Result.BirthDate.Year);
            Assert.AreEqual(1, result[1].Result.BirthDate.Month);
            Assert.AreEqual(1, result[1].Result.BirthDate.Day);
        }
    }
}

User Guide

A full User Guide is available at:

bytefish
  • 3,697
  • 1
  • 29
  • 35
6

Here is a short and simple solution.

                using (TextFieldParser parser = new TextFieldParser(outputLocation))
                 {
                        parser.TextFieldType = FieldType.Delimited;
                        parser.SetDelimiters(",");
                        string[] headers = parser.ReadLine().Split(',');
                        foreach (string header in headers)
                        {
                            dataTable.Columns.Add(header);
                        }
                        while (!parser.EndOfData)
                        {
                            string[] fields = parser.ReadFields();
                            dataTable.Rows.Add(fields);
                        }
                    }
The Hawk
  • 121
  • 2
  • 4
1

Here is my KISS implementation...

using System;
using System.Collections.Generic;
using System.Text;

class CsvParser
{
    public static List<string> Parse(string line)
    {
        const char escapeChar = '"';
        const char splitChar = ',';
        bool inEscape = false;
        bool priorEscape = false;

        List<string> result = new List<string>();
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < line.Length; i++)
        {
            char c = line[i];
            switch (c)
            {
                case escapeChar:
                    if (!inEscape)
                        inEscape = true;
                    else
                    {
                        if (!priorEscape)
                        {
                            if (i + 1 < line.Length && line[i + 1] == escapeChar)
                                priorEscape = true;
                            else
                                inEscape = false;
                        }
                        else
                        {
                            sb.Append(c);
                            priorEscape = false;
                        }
                    }
                    break;
                case splitChar:
                    if (inEscape) //if in escape
                        sb.Append(c);
                    else
                    {
                        result.Add(sb.ToString());
                        sb.Length = 0;
                    }
                    break;
                default:
                    sb.Append(c);
                    break;
            }
        }

        if (sb.Length > 0)
            result.Add(sb.ToString());

        return result;
    }

}
Alex Begun
  • 451
  • 3
  • 7
  • 1
    This does not deal with line breaks within quoted strings which is valid in a CSV file. – John Leidegren Jan 20 '18 at 13:50
  • Alex, what John is trying to say is that RFC 4180 (https://www.ietf.org/rfc/rfc4180.txt -- See section 2 and item 6) allows for a column to have a CR LF in the middle of a column effectively spreading it over 2 lines in a file. Your solution will probably work well in most cases (especially if the CSV files was created by saving out of Excel), but it doesn't cover this edge case. CsvHelper, mentioned above, is supposed to take this case into consideration. – David Yates Jan 25 '18 at 20:35
  • Yes, this is true, but if you have CR LF in your CSV, chances you should probably not be using CSV, but something more appropriate like, json, or xml, or fixed length format. – Alex Begun Mar 08 '18 at 05:53
1

Some time ago I had wrote simple class for CSV read/write based on Microsoft.VisualBasic library. Using this simple class you will be able to work with CSV like with 2 dimensions array. You can find my class by the following link: https://github.com/ukushu/DataExporter

Simple example of usage:

Csv csv = new Csv("\t");//delimiter symbol

csv.FileOpen("c:\\file1.csv");

var row1Cell6Value = csv.Rows[0][5];

csv.AddRow("asdf","asdffffff","5")

csv.FileSave("c:\\file2.csv");

For reading header only you need is to read csv.Rows[0] cells :)

Andrew_STOP_RU_WAR_IN_UA
  • 9,318
  • 5
  • 65
  • 101
1

This code reads csv to DataTable:

public static DataTable ReadCsv(string path)
{
    DataTable result = new DataTable("SomeData");
    using (TextFieldParser parser = new TextFieldParser(path))
    {
        parser.TextFieldType = FieldType.Delimited;
        parser.SetDelimiters(",");
        bool isFirstRow = true;
        //IList<string> headers = new List<string>();

        while (!parser.EndOfData)
        {
            string[] fields = parser.ReadFields();
            if (isFirstRow)
            {
                foreach (string field in fields)
                {
                    result.Columns.Add(new DataColumn(field, typeof(string)));
                }
                isFirstRow = false;
            }
            else
            {
                int i = 0;
                DataRow row = result.NewRow();
                foreach (string field in fields)
                {
                    row[i++] = field;
                }
                result.Rows.Add(row);
            }
        }
    }
    return result;
}
polina-c
  • 6,245
  • 5
  • 25
  • 36
1

Another one to this list, Cinchoo ETL - an open source library to read and write multiple file formats (CSV, flat file, Xml, JSON etc)

Sample below shows how to read CSV file quickly (No POCO object required)

string csv = @"Id, Name
1, Carl
2, Tom
3, Mark";

using (var p = ChoCSVReader.LoadText(csv)
    .WithFirstLineHeader()
    )
{
    foreach (var rec in p)
    {
        Console.WriteLine($"Id: {rec.Id}");
        Console.WriteLine($"Name: {rec.Name}");
    }
}

Sample below shows how to read CSV file using POCO object

public partial class EmployeeRec
{
    public int Id { get; set; }
    public string Name { get; set; }
}

static void CSVTest()
{
    string csv = @"Id, Name
1, Carl
2, Tom
3, Mark";

    using (var p = ChoCSVReader<EmployeeRec>.LoadText(csv)
        .WithFirstLineHeader()
        )
    {
        foreach (var rec in p)
        {
            Console.WriteLine($"Id: {rec.Id}");
            Console.WriteLine($"Name: {rec.Name}");
        }
    }
}

Please check out articles at CodeProject on how to use it.

Cinchoo
  • 6,088
  • 2
  • 19
  • 34
1

Single source file solution for straightforward parsing needs, useful. Deals with all the nasty edge cases. Such as new line normalization and handling new lines in quoted string literals. Your welcome!

If you CSV file has a header you just read out the column names (and compute column indexes) from the first row. Simple as that.

Note that Dump is a LINQPad method, you might want to remove that if you are not using LINQPad.

void Main()
{
    var file1 = "a,b,c\r\nx,y,z";
    CSV.ParseText(file1).Dump();

    var file2 = "a,\"b\",c\r\nx,\"y,z\"";
    CSV.ParseText(file2).Dump();

    var file3 = "a,\"b\",c\r\nx,\"y\r\nz\"";
    CSV.ParseText(file3).Dump();

    var file4 = "\"\"\"\"";
    CSV.ParseText(file4).Dump();
}

static class CSV
{
    public struct Record
    {
        public readonly string[] Row;

        public string this[int index] => Row[index];

        public Record(string[] row)
        {
            Row = row;
        }
    }

    public static List<Record> ParseText(string text)
    {
        return Parse(new StringReader(text));
    }

    public static List<Record> ParseFile(string fn)
    {
        using (var reader = File.OpenText(fn))
        {
            return Parse(reader);
        }
    }

    public static List<Record> Parse(TextReader reader)
    {
        var data = new List<Record>();

        var col = new StringBuilder();
        var row = new List<string>();
        for (; ; )
        {
            var ln = reader.ReadLine();
            if (ln == null) break;
            if (Tokenize(ln, col, row))
            {
                data.Add(new Record(row.ToArray()));
                row.Clear();
            }
        }

        return data;
    }

    public static bool Tokenize(string s, StringBuilder col, List<string> row)
    {
        int i = 0;

        if (col.Length > 0)
        {
            col.AppendLine(); // continuation

            if (!TokenizeQuote(s, ref i, col, row))
            {
                return false;
            }
        }

        while (i < s.Length)
        {
            var ch = s[i];
            if (ch == ',')
            {
                row.Add(col.ToString().Trim());
                col.Length = 0;
                i++;
            }
            else if (ch == '"')
            {
                i++;
                if (!TokenizeQuote(s, ref i, col, row))
                {
                    return false;
                }
            }
            else
            {
                col.Append(ch);
                i++;
            }
        }

        if (col.Length > 0)
        {
            row.Add(col.ToString().Trim());
            col.Length = 0;
        }

        return true;
    }

    public static bool TokenizeQuote(string s, ref int i, StringBuilder col, List<string> row)
    {
        while (i < s.Length)
        {
            var ch = s[i];
            if (ch == '"')
            {
                // escape sequence
                if (i + 1 < s.Length && s[i + 1] == '"')
                {
                    col.Append('"');
                    i++;
                    i++;
                    continue;
                }
                i++;
                return true;
            }
            else
            {
                col.Append(ch);
                i++;
            }
        }
        return false;
    }
}
John Leidegren
  • 59,920
  • 20
  • 131
  • 152
1

This parser supports nested commas and quotes in a column:

static class CSVParser
{
    public static string[] ParseLine(string line)
    {
        List<string> cols = new List<string>();
        string value = null;

        for(int i = 0; i < line.Length; i++)
        {
            switch(line[i])
            {
                case ',':
                    cols.Add(value);
                    value = null;
                    if(i == line.Length - 1)
                    {// It ends with comma
                        cols.Add(null);
                    }
                    break;
                case '"':
                    cols.Add(ParseEnclosedColumn(line, ref i));
                    i++;
                    break;
                default:
                    value += line[i];
                    if (i == line.Length - 1)
                    {// Last character
                        cols.Add(value);                           
                    }
                    break;
            }
        }

        return cols.ToArray();
    }//ParseLine

    static string ParseEnclosedColumn(string line, ref int index)
    {// Example: "b"",bb"
        string value = null;
        int numberQuotes = 1;
        int index2 = index;

        for (int i = index + 1; i < line.Length; i++)
        {
            index2 = i;
            switch (line[i])
            {
                case '"':
                    numberQuotes++;
                    if (numberQuotes % 2 == 0)
                    {
                        if (i < line.Length - 1 && line[i + 1] == ',')
                        {
                            index = i;
                            return value;
                        }
                    }
                    else if (i > index + 1 && line[i - 1] == '"')
                    {
                        value += '"';
                    }
                    break;
                default:
                    value += line[i];
                    break;
            }
        }

        index = index2;
        return value;
    }//ParseEnclosedColumn 
}//class CSVParser
0

Based on unlimit's post on How to properly split a CSV using C# split() function? :

string[] tokens = System.Text.RegularExpressions.Regex.Split(paramString, ",");

NOTE: this doesn't handle escaped / nested commas, etc., and therefore is only suitable for certain simple CSV lists.

Community
  • 1
  • 1
radsdau
  • 495
  • 5
  • 16
  • 2
    This is very bad and likely slow :) – EKS Sep 20 '16 at 14:07
  • 1
    Probably, but it works perfectly and simply for a small set of parameters, therefore is a valid and helpful solution. Why downvote it? "Very Bad" is a little extreme, don't you think? – radsdau Sep 30 '16 at 04:37
  • 1
    It doesn't handle escaped / nested commas, etc. Will work in some cases but definitely won't work for all csv files – NStuke Oct 10 '16 at 23:56
  • Your are right; I'll edit the reply to reflect that. Thanks. But it does still have its place. – radsdau Oct 11 '16 at 06:26
  • This worked perfectly for my use case where I'm building a sql server clr dll and can't use any of these other external packages. I just needed to parse a simple csv file with a filename and row count. – dubvfan87 Feb 11 '20 at 18:59
0

If anyone wants a snippet they can plop into their code without having to bind a library or download a package. Here is a version I wrote:

    public static string FormatCSV(List<string> parts)
    {
        string result = "";

        foreach (string s in parts)
        {
            if (result.Length > 0)
            {
                result += ",";

                if (s.Length == 0)
                    continue;
            }

            if (s.Length > 0)
            {
                result += "\"" + s.Replace("\"", "\"\"") + "\"";
            }
            else
            {
                // cannot output double quotes since its considered an escape for a quote
                result += ",";
            }
        }

        return result;
    }

    enum CSVMode
    {
        CLOSED = 0,
        OPENED_RAW = 1,
        OPENED_QUOTE = 2
    }

    public static List<string> ParseCSV(string input)
    {
        List<string> results;

        CSVMode mode;

        char[] letters;

        string content;


        mode = CSVMode.CLOSED;

        content = "";
        results = new List<string>();
        letters = input.ToCharArray();

        for (int i = 0; i < letters.Length; i++)
        {
            char letter = letters[i];
            char nextLetter = '\0';

            if (i < letters.Length - 1)
                nextLetter = letters[i + 1];

            // If its a quote character
            if (letter == '"')
            {
                // If that next letter is a quote
                if (nextLetter == '"' && mode == CSVMode.OPENED_QUOTE)
                {
                    // Then this quote is escaped and should be added to the content

                    content += letter;

                    // Skip the escape character
                    i++;
                    continue;
                }
                else
                {
                    // otherwise its not an escaped quote and is an opening or closing one
                    // Character is skipped

                    // If it was open, then close it
                    if (mode == CSVMode.OPENED_QUOTE)
                    {
                        results.Add(content);

                        // reset the content
                        content = "";

                        mode = CSVMode.CLOSED;

                        // If there is a next letter available
                        if (nextLetter != '\0')
                        {
                            // If it is a comma
                            if (nextLetter == ',')
                            {
                                i++;
                                continue;
                            }
                            else
                            {
                                throw new Exception("Expected comma. Found: " + nextLetter);
                            }
                        }
                    }
                    else if (mode == CSVMode.OPENED_RAW)
                    {
                        // If it was opened raw, then just add the quote 
                        content += letter;
                    }
                    else if (mode == CSVMode.CLOSED)
                    {
                        // Otherwise open it as a quote 

                        mode = CSVMode.OPENED_QUOTE;
                    }
                }
            }
            // If its a comma seperator
            else if (letter == ',')
            {
                // If in quote mode
                if (mode == CSVMode.OPENED_QUOTE)
                {
                    // Just read it
                    content += letter;
                }
                // If raw, then close the content
                else if (mode == CSVMode.OPENED_RAW)
                {
                    results.Add(content);

                    content = "";

                    mode = CSVMode.CLOSED;
                }
                // If it was closed, then open it raw
                else if (mode == CSVMode.CLOSED)
                {
                    mode = CSVMode.OPENED_RAW;

                    results.Add(content);

                    content = "";
                }
            }
            else
            {
                // If opened quote, just read it
                if (mode == CSVMode.OPENED_QUOTE)
                {
                    content += letter;
                }
                // If opened raw, then read it
                else if (mode == CSVMode.OPENED_RAW)
                {
                    content += letter;
                }
                // It closed, then open raw
                else if (mode == CSVMode.CLOSED)
                {
                    mode = CSVMode.OPENED_RAW;

                    content += letter;
                }
            }
        }

        // If it was still reading when the buffer finished
        if (mode != CSVMode.CLOSED)
        {
            results.Add(content);
        }

        return results;
    }
John
  • 5,942
  • 3
  • 42
  • 79
0

For smaller input CSV data LINQ is fully enough. For example for the following CSV file content:

schema_name,description,utype
"IX_HE","High-Energy data","x"
"III_spectro","Spectrosopic data","d"
"VI_misc","Miscellaneous","f"
"vcds1","Catalogs only available in CDS","d"
"J_other","Publications from other journals","b"

when we read the whole content into single string called data, then

using System;
using System.IO;
using System.Linq;

var data = File.ReadAllText(Path2CSV);

// helper split characters
var newline = Environment.NewLine.ToCharArray();
var comma = ",".ToCharArray();
var quote = "\"".ToCharArray();

// split input string data to lines
var lines = data.Split(newline);

// first line is header, take the header fields
foreach (var col in lines.First().Split(comma)) {
    // do something with "col"
}

// we skip the first line, all the rest are real data lines/fields
foreach (var line in lines.Skip(1)) {
    // first we split the data line by comma character
    // next we remove double qoutes from each splitted element using Trim()
    // finally we make an array
    var fields = line.Split(comma)
        .Select(_ => { _ = _.Trim(quote); return _; })
        .ToArray();
    // do something with the "fields" array
}
Ladislav
  • 320
  • 3
  • 10
  • 1
    Note, we also can read the file content to string array, means line by line using: var lines = File.ReadAllLines(Path2CSV); – Ladislav Jul 28 '22 at 08:27