2

I'm attempting to find an elegant way to read a cvs string via 4.0 linq and have been somewhat unsuccessful due to embedded commas between quotes. Here is an example of 3 columns and 3 rows:

Date,Years,MemoText "2011-01-01","0.5","Memo Text
Memo Text continuing
And still continuing, and then comma, yet the memo is in quotes"
"2010-01-01","0.5","Memo Text, Memo without line breaks"
"2009-01-01","1.0","Plain memo text"

So far I've come up with the following faulty code as the pulling together other stack exchange bits. This doesn't work since carriage line feeds in memo text since carriage return line feeds break up memo text into multiple fields.

using (var reader = new StreamReader(getReader))
{
    var records = reader.ReadToEnd().Split(new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries);
    var enumRecords = records.Skip(1).Take(1);
    using (var dc = new DataContext())
    {
        foreach (var record in enumRecords
            .Select(x => x.Trim()
            .Split(new char[] { ',' }))
            .Select(fields => new Entity
            {
                Date = (!string.IsNullOrEmpty(record.ElementAt(0))) ? Convert.ToDateTime(record.ElementAt(0)) : default(DateTime),
                DecimalYears = record.ElementAt(1),
                MemoText = record.ElementAt(2)
            }))
        {
            //Commit DataContext
        }
    }
}

No dice when splitting on commas alone since commas exist between quoted text:

using (var reader = new StreamReader(getReader))
{
    var sdata = reader.ReadToEnd();

    using (var dc = new DataContext())
    {
        var query = sdata
            .Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries)
            .Replace(Environment.NewLine, string.Empty)
            .Replace("\"\"", "\",\"")
            .Select((i, n) => new { i, n })
            .GroupBy(a => a.n / 3)
            .Skip(1).Take(1);

        foreach (var fields in query)
        {
            var newEntity = new Entity();
            newEntity.Date = (!string.IsNullOrEmpty(fields.ElementAt(0).i)) ? Convert.ToDateTime(fields.ElementAt(0).i) : default(DateTime);
            newEntity.DecimalYears = fields.ElementAt(1).i;
            newEntity.MemoText = fields.ElementAt(2).i;
        }
    }
}

So far what seems like a simple objective is bordering on verbose ugly code, possibly someone out there has a clean and functional way to approach this using LINQ?

EdenRockStar
  • 23
  • 1
  • 4

4 Answers4

3

The de-facto answer for .Net suggests not to do it yourself - there are a ton of third party libraries that will make this simple:

CSV File Imports in .Net

Community
  • 1
  • 1
dash
  • 89,546
  • 4
  • 51
  • 71
  • I guess I've been somewhat determined to make this work in LINQ .net 4.0, is it possible I'm off by just a line or possibly many many lines in the above code snippets? – EdenRockStar Dec 02 '11 at 12:30
  • Of course, there may be many many more conditions I have not run across yet, this is just what I know of for now. – EdenRockStar Dec 02 '11 at 12:32
  • 1
    File parsing is a topic that's broken many a project. Everything is fine until someone puts the wrong value in a field, or you employ Mr O'Connor and so on and so forth. I'd just use the lessons of people that have gone before me and suffered all the pain :-) – dash Dec 02 '11 at 12:50
  • I ended up using the TextFieldParser from the stackoverflow link you attached. Works great, thanks much! – EdenRockStar Dec 02 '11 at 20:53
2

Here is a great extension method by Eric White that will handle your CSV requirements:

  • Only a comma is valid for the separator character
  • Values can be quoted. The quotes are trimmed
  • Quoted values can have internal commas
  • Quoted values can also have internal escape sequences: backslash followed by any character, including quote (\”), backslash (\) or any other character (\a)
  • CsvSplit will throw an exception for incorrectly formatted strings

http://blogs.msdn.com/b/ericwhite/archive/2008/09/30/linq-to-text-and-linq-to-csv.aspx

Ralph Willgoss
  • 11,750
  • 4
  • 64
  • 67
0

If you can change the structure of the file, I'd recommend you to find an unique separator that isn't use somewhere else in your content (ie ';').

Then, using a third party library (like this one LINQ to CSV) would clearly ease the task.

You'd be able to use a clean syntax like that:

var memos = from p in myFile
            select new { p.Date, p.DecimalYears, p.MemoText };
Stéphane Bebrone
  • 2,713
  • 17
  • 18
  • Wow, yah, that would be great if I had control of rendering the csv. Unfortunetly it is coming from salesforce bulkapi. I could reformat the output but would probably be additional overhead. – EdenRockStar Dec 02 '11 at 12:39
  • At least you're not the only to suffer from this problem: http://success.salesforce.com/ideaview?id=08730000000BpAeAAK – Stéphane Bebrone Dec 02 '11 at 12:52
0

Here is the code I used in case this is helpful to someone in the future or someone else feels the urge to fine tune it.

using (var reader = new StreamReader(Service.GetResult(batchInfo, results.result[0])))
{
    using (var dc = new DataContext())
    {
        using (var parser = new TextFieldParser(reader))
        {
            parser.Delimiters = new string[] { "," };
            parser.TrimWhiteSpace = true;
            while (true)
            {
                string[] parts = parser.ReadFields();
                if (parts == null) { break; }
                try
                {
                    var newEntity = new Entity();
                    newEntity.ID = Guid.NewGuid();
                    newEntity.Date = (!string.IsNullOrEmpty(parts[0])) ? Convert.ToDateTime(parts[0]) : default(DateTime);
                    newEntity.Year = parts[1];
                    newEntity.MemoText = parts[2];
                    dc.Entity.InsertOnSubmit(newEntity);
                    dc.SubmitChanges();
                }
                catch (MalformedLineException mle)
                {
                    string message = mle.Message;
                    //TODO: log an error
                }
            }
        }
    }
}
EdenRockStar
  • 23
  • 1
  • 4