0

I am trying to parse a CSV file with commas in the data. All fields do not have quotes around them, but SOME of the dollar amounts do when there is a comma in them. So for example the file looks something like this:

0000-0C,Sales-Data Entry,10/31/2016,000000,VJ,Comapny,,312.3
0000-0C,Sales-Data Entry,10/31/2016,000000,VJ,Company,,"1,420.97"

So as you can see there is a comma in the 1,420.97 but it does have quotes around this. Is there a way to reliably read this file?

Here is my code:

var path = @"glid.csv";

TextFieldParser parser = new TextFieldParser(path);

parser.HasFieldsEnclosedInQuotes = true;
parser.SetDelimiters(",");

while (!parser.EndOfData)
{
    parser.ReadLine();
    fields = parser.ReadFields();

    //do something
}

parser.Close();
Tophandour
  • 297
  • 5
  • 15
rohanharrison
  • 233
  • 4
  • 10
  • can you change the delimiter from csv file to another signal, like ; ? – Marlon Vidal Jan 18 '17 at 13:26
  • 2
    Possible duplicate of [split a comma-separated string with both quoted and unquoted strings](http://stackoverflow.com/questions/3776458/split-a-comma-separated-string-with-both-quoted-and-unquoted-strings) – PaulF Jan 18 '17 at 13:27
  • @MarlonVidal That was my initial thought, and I have an email typed out and ready to send for the person that is responsible for exporting the data, but I wanted to at least try and make this work before I do that. – rohanharrison Jan 18 '17 at 13:27
  • Just set [`TextFieldParser.HasFieldsEnclosedInQuotes`](https://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.hasfieldsenclosedinquotes(v=vs.110).aspx) to true – juharr Jan 18 '17 at 13:28
  • @juharr Did you bother to read the code? It is... – rohanharrison Jan 18 '17 at 13:29
  • @rohanharrison And you're saying it still separates `"1,420.97"` into `"1` and `420.97"`? What exactly are the results you get and what you expect? – juharr Jan 18 '17 at 13:31
  • If it is an option for the originator of the data to change the delimiter then I would suggest using ascii characters 29,30 and 31 which are specifically designed to delimit data without clashing with the data. – Crowcoder Jan 18 '17 at 13:32
  • @juharr Yes that is exactly what happens. I want it to just read as 1420.97. – rohanharrison Jan 18 '17 at 13:43
  • @rohanharrison You should not be calling `ReadLine` like that as it will read in a line, meaning your current code is skipping every other line. But I tried your code and it did correctly parse that value as `1,420.97` – juharr Jan 18 '17 at 13:49
  • @juharr Wow, you were right. I can't believe I missed that. Thank you. – rohanharrison Jan 18 '17 at 14:04

3 Answers3

3

CSV is a rebellious bird, that none can tame and therefore NEVER try to parse it with own rules.

I strongly recommend to use the OleDbDataAdapter to fill a Dataset instead of the Microsoft.VisualBasic.FileIO.TextFieldParser , this saves you a lot of time and headaches.

    string fileName = @"glid.csv";
    DataSet ds = new DataSet("csvData");
    string dir = Path.GetDirectoryName(fileName);
    string connstr = String.Format("Provider = Microsoft.Jet.OleDb.4.0; Data Source={0}; Extended Properties = \"Text;HDR=YES;FMT=Delimited\"",dir);

    using (System.Data.OleDb.OleDbConnection conn = new System.Data.OleDb.OleDbConnection())
    {
        conn.Open();
        OleDbDataAdapter adapter = new OleDbDataAdapter("SELECT * FROM " + Path.GetFileName(fileName), conn);
        adapter.Fill(ds);
    }

    var p = ds; //<-- here is your data;
Caspar Kleijne
  • 21,552
  • 13
  • 72
  • 102
  • 2
    That's not "handmade code" they are using the [`TextFieldParser`](https://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser(v=vs.110).aspx) class – juharr Jan 18 '17 at 13:35
  • Any idea why it's telling me "Input String was not in a correct format"? My path is: @"C:\Users\myname\Desktop\glid.csv" – rohanharrison Jan 18 '17 at 13:49
  • @rohanharrison 'fixed Source= { 0 }' to 'Source={0}' – Caspar Kleijne Jan 18 '17 at 14:14
3

Are you sure you want to use both

parser.ReadLine();

and

fields = parser.ReadFields();

As per the documentation https://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser(v=vs.110).aspx this will skip a line, process the next line over and over until the end of the file.

You may also wish to check your encoding type, TextFieldParser constructor accepts an Encoding type: https://msdn.microsoft.com/en-us/library/ms128085(v=vs.110).aspx

try the following encodings: https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx

Ben
  • 514
  • 2
  • 10
  • This is the correct answer. It was my stupidity that missed this. Thank you. I moved by readline to before my while loop. – rohanharrison Jan 18 '17 at 14:05
0

In all honesty, I'd reccommend using this nuget package. It's really helpful in both reading and writing csv and even allows you to directly map onto a class creating an object model from your records. It will handle all of the escaping and complicated stuff like that for you. Put simply, it just works.

It's really good stuff and very easy to use!

ScottishTapWater
  • 3,656
  • 4
  • 38
  • 81