0

I have a small program that reads in a CSV file which contains a report delimited by commas. Within the report, one of the fields is a date, which I convert to a date/time, and only pull information that is from within a certain time frame. Here is my problem though: the report is actually a list of jobs that run on a certain system. Some of the job names however, contain commas on their own. This means that an Excel output report is extremely messy because if a job name has a comma the job name will be divided between 2 cells. I am a bit new to programming so the only way I could think of solving this was checking to see how many fields ended up in my array delimited by commas. And if it was greater than the normal, I'd concatenate two of the fields which I know will be the job name. However, the issue is that if a job name contains 2 commas, this will not work because it is only set up to deal with 1 extra comma in the data.

I should add, the CSV report that I read in is generated by another application where I can't control how it is delimited. Otherwise, I would change it to pipes or something of the sort.

Any thoughts? Below is the portion of code that deals with it:

StreamReader SR = new StreamReader(inputFile);
StreamWriter SW = new StreamWriter(outputFile);
string records;
//read headers from first line
string headers = records = SR.ReadLine();
SW.WriteLine(headers);
DateTime YesterdayAM = Convert.ToDateTime(DateTime.Now.AddDays(-1).ToShortDateString() + " 05:00:00 AM");
while ((records = SR.ReadLine()) != null)
{
    if (records.Trim().Length > 0)
    {
        string daterecord = GetDateTimeFromStringArray(records);
        if (daterecord.Length > 0)
        {
            DateTime recordDate = Convert.ToDateTime(daterecord);
            if (recordDate >= YesterdayAM)
            {
                string[] checkfields = records.Split(',');
                if (checkfields.Length > 13)
                {
                    string[] replacefields = { checkfields[0], checkfields[1] + " " + checkfields[2], checkfields[3], checkfields[4], checkfields[5], checkfields[6], checkfields[7], checkfields[8], checkfields[9], checkfields[10], checkfields[11], checkfields[12] };
                    for (int i = 0; i < replacefields.Length; i++)
                    {
                        SW.Write(replacefields[i] + ",");
                    }
                    SW.Write(Environment.NewLine);
                }
                else
                {
                    SW.WriteLine(records);
                }
            }
        }
    }
}
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user3494110
  • 417
  • 2
  • 9
  • 25
  • I think this is a similar question: http://stackoverflow.com/questions/769621/dealing-with-commas-in-a-csv-file – skeryl Apr 03 '14 at 14:13
  • Generally, in a CSV file, if you cell contains the delimiter, it should be enclosed in quotes. That's what Excel will do with any cells that contain commas when you save as CSV. If that's completely impossible, then with the technique you describe, you could handle multiple commas in the job name field by calling the method to concatenate fields again until the length is correct. – Matt Burland Apr 03 '14 at 14:15

1 Answers1

0

It's a bit hacky to do it this way, but if you can't fix the source and you know that extra commas will only appear in the one field you could do something like this:

            string[] checkfields = records.Split(',');
            while (checkfields.Length > 13)
            {
                 // concat [1] & [2] into a new array
                 checkfields = checkfields.Take(1)
                     .Concat(new string[] { string.Join("", checkfields.Skip(1).Take(2).ToArray()) })
                     .Concat(checkfields.Skip(3)).ToArray();
            }    // if it's still too long it will loop again

Or better yet:

            string[] checkfields = records.Split(',');
            int extraFields = checkfields.Length - 13;
            if (extraFields > 0) 
            {
                 // concat fields 1....1 + extraFields
                 checkfields = checkfields.Take(1)
                     .Concat(new string[] { string.Join("", checkfields.Skip(1).Take(extraFields).ToArray()) })
                     .Concat(checkfields.Skip(extraFields + 1)).ToArray();
            }    // avoids looping by doing it all in one go

Note: the linq statements are untested and might not be the absolute most efficient way to do it. Also all the "magic" numbers should probably be replaced with constants for maintainability.

Matt Burland
  • 44,552
  • 18
  • 99
  • 171