0

how to process the CSV file if it already contain a contains: comma, aposthrope, semi-colonin in 1 of th column?

The end-user will receive 3 x p/week an Excel file. I can't process the excel file in dotnet, some reason. I have to use convert it to CSV file. When the user receives the excel file (s)he needs to do SAVE as and choose CSV file, then the DOTNET application (my custom app) should read this and process it.

The problem is when the CSV already contains comma the applications break. as you below can see the a column starts with XBegin and ends with Xend. between them it may contain : comma, aposthrope, semi-colon etc. so I think when you do SAVE as Microsoft put's them in double quotes..

the question is how to process this code...? I'm stuck , please advice? Below is my piece of code.

      private DataSet GetData(byte[] csvcontent)
    {
        try
        {

            //for the header (Column HEADING)
            string strLine;
            string[] strArray;
            char[] charArray = new char[] { ',' };
            DataSet ds = new DataSet();
            DataTable dt = ds.Tables.Add("TheData");
            MemoryStream reader = new MemoryStream(csvcontent);
            StreamReader sr = new StreamReader(reader);

            //skip the first line it's always empty.
            strLine = sr.ReadLine(); 
            //this is the heading, will become column names
            strLine = sr.ReadLine();
            strArray = strLine.Split(charArray);
            // bool firstRow = true;
            for (int x = 0; x <= strArray.GetUpperBound(0); x++)
            {
                switch (x)
                {
                    case 3:
                    case 10:
                    case 16:
                    case 18:
                    case 20:
                        dt.Columns.Add(strArray[x].Trim(), typeof(DateTime));
                        break;
                    default:
                        dt.Columns.Add(strArray[x].Trim());
                        break;
                }

            }

            //PROCESS the RECORDS/DATA itself / ADD ROWS TO TABLE
            strLine = sr.ReadLine();
            while (strLine != null)
            {
                strArray = strLine.Split(charArray);
                DataRow dr = dt.NewRow();
                for (int i = 0; i <= strArray.GetUpperBound(0) - 1; i++)
                {
                    switch (i)
                    {
                        case 3:
                        case 10:
                        case 16:
                        case 18:
                        case 20:
                            if (!string.IsNullOrEmpty(strArray[i]))
                            {
                                LeKey = strArray[i].ToString();
                                dr[i] = Convert.ToDateTime(strArray[i]);
                            }
                            break;
                        default:
                            //need this to trace in case of error
                            if (i == 7)
                            {
                                LeKey = strArray[i].ToString();
                            }

                            dr[i] = strArray[i].Trim();
                            break;
                    }
                    //dr[i] = strArray[i].Trim();
                }
                dt.Rows.Add(dr);
                strLine = sr.ReadLine();
            }
            sr.Close();
            return ds;
        }
        catch (Exception ex)
        {
            throw ex;
        }
    }

732017,INV09.020500,C1,30/11/2016,"XBegin - COMMITMENT FILE FOR FACILITIES FOR THE ORGANISATION""Footbal, robotics & agenda"" DURING THE DIFFERENCE DAY IN bazar - ILM - 03-05-2016, SI2.st017 Xend ", Test.Unit.z.1,Voodo,PLCDMSA,TIN100,2016

ps: XBegin till XEnd is 1 column....

UPDATE:
-------------
the excel file will be send 3 x p/week to end-user via email, I can't ask the user to manipulate data, s(he) should only do SAVE as and choose CSV file or other text format in EXcel app... and then my app should process this generated file.

user3080110
  • 107
  • 1
  • 3
  • 9
  • [reading-a-csv-file-in-net](http://stackoverflow.com/questions/1405038/reading-a-csv-file-in-net) – MichaelMao May 02 '16 at 13:01
  • Or use a pipe "|", not the best solution, but its the quickest. – Alfons May 02 '16 at 13:07
  • @Alfons: how to use the pipe. In excel I can't choose a separator.... the user can't change its regional settings only for this. – user3080110 May 02 '16 at 13:17
  • I assumed... Arg, thats what you get for quick ideas. Well what the @user2023861 wrote, dont writer your own parser, but use an existing one, or read directly from excel as per http://stackoverflow.com/questions/15828/reading-excel-files-from-c-sharp – Alfons May 02 '16 at 15:01

2 Answers2

3

Don't roll your own CSV generator or parser. As you've found out, the format is not as easy as it sounds. Use something like CsvHelper.

As for your specific problem, the usual answer is to put your values in quotes. So instead of a single cell being abc,def, it should be "abc,def". Your code still can't handle this situation because you're spliting your string using a simple string.Split(). You could do something that enumerates through each character of the row and creates a new value only if you're not already inside a set of quotation marks, but you'd be reinventing the wheel. Use a software library for this.

user2023861
  • 8,030
  • 9
  • 57
  • 86
  • how can I put my values between in quotes? When I do SAVE AS in Excel it's doing automatically... for some columns which contains comma, etc.. – user3080110 May 02 '16 at 13:28
  • 1
    @user3080110 Excel generates the file properly. Your code isn't reading it properly. Excel is likely only using quotes when it needs to. Your code should allow for this. This is yet another reason that you should NOT be rolling your own CSV parser. – user2023861 May 02 '16 at 13:53
0

I think I fixed the problem by using the
1) adding reference to : Microsoft.VisualBasic
then next to lines
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;

I can't void because of low points: @ MichaelMao point me the link thanks

user3080110
  • 107
  • 1
  • 3
  • 9