2

I have a file that I stream into a list object to save to a temp in my database, because I have to use the data in the temp table to join to other tables for a final result, then export this final result to a .csv file.

Everything works except for the stream read of the original file.

The file is comma-delimited and structured very specifically, and the structure never changes.

The problem I have is this:

The "AccountHolder" field has a comma amongst the characters that make up the string, so my FileStream sees this as a delimiter. How do I replace the comma in the AccountHolder string without breaking the comma-delimiter the FileStream has to adhere to?

List<object[]> fileContent = new List<object[]>();

            using (FileStream reader = File.OpenRead(ofd.FileName))
            using (TextFieldParser parser = new TextFieldParser(reader))
            {
                parser.TrimWhiteSpace = true;
                parser.Delimiters = new[] { "," };
                parser.HasFieldsEnclosedInQuotes = true;
                while (!parser.EndOfData)
                {
                    object[] line = parser.ReadFields();
                    fileContent.Add(line);
                    lstRegNo.Add(line[0].ToString().Trim());
                    lstAccHolder.Add(line[1].ToString().Trim().Replace(',', ' '));
                    lstAmount.Add(line[2].ToString().Trim().Replace(',', ' '));
                    lstAccNo.Add(line[3].ToString().Trim());
                    lstBranch.Add(line[4].ToString().Trim());
                    lstDate.Add(line[5].ToString().Trim());
                    lstCode.Add(line[6].ToString().Trim());
                    lstOrphenColumn.Add(line[7].ToString().Trim());
                }

Here's a sample of the file I'm streaming in:

000001,A WHATEVER,00000000001,0000000000001,000001,160510,17,0
000002,B WHATEVER,00000000002,0000000000002,000002,160510,17,0
000003,C, WHATEVER,00000000003,0000000000003,000003,160510,17,0
000004,D WHATEVER,00000000004,0000000000004,000004,160510,17,0
000005,E WHATEVER,00000000005,0000000000005,000005,160510,17,0

As you can see, on line 3, there's a comma in the AccountHolder's name. I need the value of this to be "C WHATEVER", not "C, WHATEVER" I want to eliminate that comma but still be able to stream the file into my List object splitting the fields in the file by commas.

Please note that the file's data will be different everytime I receive it, so simply looking for a static value won't cut it.

How do I do this?

MakePeaceGreatAgain
  • 35,491
  • 6
  • 60
  • 111
GrammatonCleric
  • 201
  • 3
  • 12
  • 2
    To resolve this issue you can either use unique separator (e.g. `;`) or put text inside `""` ([quote](https://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules) them - *fields with embedded commas or double-quote characters must be quoted"*). – Sinatr May 30 '16 at 12:05
  • Whoever provided the CSV file broke the rules of the CSV file, you should ask them to do as @Sinatr stated. Coding around such mishaps can be done, but think about all the possible issues you could have. Which columns do you join? What happens if the extra comma appears in column 7 instead of 2? It's a difficult thing to work around at this moment in time – Draken May 30 '16 at 12:13
  • @Sinatr: not sure he can change the input file (to have the field wrapped in double quotes) – Veverke May 30 '16 at 12:13
  • Then it's the responsibility of the CSV provider to fix @Veverke, it's far too difficult to work out where the comma should belong in the CSV file and is programmatically quite difficult – Draken May 30 '16 at 12:15
  • This looks like a human typed the input. Humans can make many different types of typing errors and it is very difficult to write a generic program that will find and fix all the possible human errors. – jdweng May 30 '16 at 12:15
  • Possible duplicate of [Dealing with commas in a CSV file](http://stackoverflow.com/questions/769621/dealing-with-commas-in-a-csv-file) – Draken May 30 '16 at 12:27
  • The only other idea I have is to stream the file and use String.Format to extract the fields as they're fixed-length, populate my temp table and manipulate the data in SQL once it's it my temp table. I don't want the users who will be using this tool to touch the original file. I'm afraid of more finger-trouble... – GrammatonCleric May 30 '16 at 12:30
  • Are they fixed length though? As at the moment, (I know it's an example), `C, WHATEVER` is a different length than `B WHATEVER`. Does the person/program that produces the CSV file pad the remaining space or something similar? – Draken May 30 '16 at 12:35
  • @Draken: The file is generated by an external system once a particular process is performed. The values are in a fixed-length format which makes it slightly easier for me to manipulate the data once I populate my temp table. I had a look at the spec you linked but I'm not sure how to enclose values in quotes when they're not given to me like that. I forgot to mention I'm a SQL Developer by profession...learning C# :) – GrammatonCleric May 30 '16 at 13:24
  • I would strongly recommend that you raise that as a bug with the software provider. As detailed [here](https://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules), they are breaking, what is considered, the standard contract for how a CSV should be presented. If you opened that file in Excel, it would look very odd. Who's to say the length of those fields won't change in the future, thus breaking your code? Better to fix the source of the problem – Draken May 30 '16 at 13:27
  • I despise dirty data as well but I'm not allowed to alter the data in any of the database objects in any way as we're using a 3rd party system. I usually create ETL processes to cleanup messy data but in this case it's not as simple as to just fix the source. It'll take months, if not years for the provider to look at sorting this issue out, so I have to deal with it myself – GrammatonCleric May 30 '16 at 13:39

3 Answers3

1

Well...if the fields are fixed with, then you're not setting up your parser correctly. It knows how to handle fixed-width fields which is somewhat different than a canonical CSV file.

BTW, I'd be inclined to make some constants that reflect the names of the "columns":

//--> indexes into parsed line...
const int RegNo = 0;
const int AcctHolder = 2;
const int Amount = 4;
const int AcctNo = 6;
//--> ...etc.

using Microsoft.VisualBasic.FileIO;
//...
List<object[]> fileContent = new List<object[]>();
using (FileStream reader = File.OpenRead(ofd.FileName))
using (TextFieldParser parser = new TextFieldParser(reader))
{
  parser.TextFieldType = FieldType.FixedWidth;
  parser.SetFieldWidths
  (
     6  1, //--> width of RegNo, width of ignored comma
    10, 1, //--> width of AcctHolder, width of ignored comma
    10, 1, //--> width of Amount, width of ignored comma
    13, 1, //--> etc...
     6, 1, 
     6, 1, 
     2, 1,
     1
  );
  while (!parser.EndOfData)
  {
    object[] line = parser.ReadFields();
    fileContent.Add(line);
    lstRegNo.Add( line[ RegNo ].ToString( ));
    lstAccHolder.Add(line[ AcctHolder ].ToString().Replace(',', ' '));
    lstAmount.Add(line[ Amount ].ToString().Trim().Replace(',', ' '));
    lstAccNo.Add(line[ AcctNo ].ToString().Trim());
    //--> etc...
  }
}

For more information, check this out.

Clay
  • 4,999
  • 1
  • 28
  • 45
  • Well, the file is fixed-width, but in a comma-delimited fashion...if that makes any sense? We're changing over to a new import system so the process happens as follows: 1 - Receive system-generated file 2 - Modify this file 3 - Create a new file (csv) with the data from the original file + additional data from our database The original file is separated by commas, yet it's apparent that the fields are fixed-width because it is required in that format for the older system – GrammatonCleric May 30 '16 at 17:37
  • @GrammatonCleric, Fixed width is way easier to interact with...and I assumed in the code that you'd just read the commas into fields you'd ignore. Legacy systems are *always* a pain to deal with - you can't control what's coming and there may not even be source code for anymore. Been there, and wish you much luck! – Clay May 30 '16 at 17:58
1

One idea is that whenever you read a line, you count the number of commas. So assuming your number of columns will always be 7. The code below will capture all the extra commas on the second column.

    string filename="sample.csv";
    var stream=new StreamReader(filename);
    string l=null;

    while((l = stream.ReadLine())!=null)
    {
        var ss=l.Split(',');
        int count = ss.Length - 1;//number of commas found
        int extraCommas = count - 7;

        var regNo = ss[0];
        StringBuilder accHolder = new StringBuilder();

        //loops from 1 and captures x number of extra commas
        for (int x = 1; x <= extraCommas+1; x++)
        {
            accHolder.Append(ss[x]);
        }

        var amount = ss[2 + extraCommas];
        var accNo = ss[3 + extraCommas];
        var branch = ss[4 + extraCommas];
        var date = ss[5 + extraCommas];   
        //etc....             
    }
Jeff Pang
  • 161
  • 1
  • 6
0

Ultimately what I did was the following:

List<string[]> fileContents = new List<string[]>();

            var lines = File.ReadAllLines(ofd.FileName).ToList();
            foreach (var item in lines)
            {
                string RegNo = string.Format("{0}", item.ToString().Substring(0, 19));
                string accHolder = string.Format("{0}", item.ToString().Substring(21, 30));
                string amount = string.Format("{0}", item.ToString().Substring(52, 11));
                string accNo = string.Format("{0}", item.ToString().Substring(64, 13));
                string branch = string.Format("{0}", item.ToString().Substring(78, 6));
                string date = string.Format("{0}-{1}-{2}", "20" + item.ToString().Substring(85, 2), item.ToString().Substring(87, 2), item.ToString().Substring(89, 2));
                string code = string.Format("{0}", item.ToString().Substring(92, 2));
                string orphenColumn = string.Format("{0}", item.ToString().Substring(95, 1));

                lstRegNo.Add(RegNo.Trim());
                lstAccHolder.Add(ExtensionMethods.RemoveSpecialCharacters(accHolder.Trim()));
                lstAmount.Add(amount.Trim());
                lstAccNo.Add(accNo.Trim());
                lstBranch.Add(branch.Trim());
                lstDate.Add(date);
                lstCode.Add(code.Trim());
                lstOrphenColumn.Add(orphenColumn);
            }

This seems to work for every file I've imported so far.

Thanks for all your suggestions!

GrammatonCleric
  • 201
  • 3
  • 12