0

Possible Duplicate:
Dealing with commas in a CSV file

I am currently parsing values from a CSV file and adding them to a datatable.

The csv file contains 5 columns and am parsing each row before adding it to the datatable.

After parsing the csv, the datatable could be visualized as the following:

|  Town/City  | Cost |
| Birmingham  | 400  |
| Manchester  | 500  |

For this data, there are no problems. However, I have some values that look like the following:

|  Town/City    | Cost |
|  London, West | 800  |

As there is a comma between a value for the one column, it is obviously parsing this as a seperate column.

The data cannot be changed, therefore I need a way to parse this as a single column rather than two.

This is my code so far that parses rows which have 5 columns. I have commented the bit where I guess the new code will need to go.

        //parse csv file and return as data table
    public System.Data.DataTable GetCsvData()
    {
        string strLine;
        char[] charArray = new char[] { ',' };

        List<string> strList = new List<string>();

        System.Data.DataTable dt = new System.Data.DataTable("csvData");
        System.IO.FileStream fileStream = null;
        System.IO.StreamReader streamReader = null;

        if (!string.IsNullOrEmpty(csvFilePath))
        {
            fileStream = new System.IO.FileStream(csvFilePath, System.IO.FileMode.Open);
            streamReader = new System.IO.StreamReader(fileStream);

            strLine = streamReader.ReadLine();

            strList = strLine.Split(charArray).ToList();

            //only add first 5 columns
            for (int i = 0; i <= 4; i++)
                dt.Columns.Add(strList[i].Trim());

            strLine = streamReader.ReadLine();

            while (strLine != null)
            {
                strList = strLine.Split(charArray).ToList();

                System.Data.DataRow dataRow = dt.NewRow();

                /*THIS CODE PARSES THE ROW'S 5 COLUMNS AND NEEDS TO PARSE COMMA
                SEPERATED VALUES AS A SINGLE VALUE*/
                for (int i = 0; i <= 4; i++)
                    dataRow[i] = strList[i].Trim();

                dt.Rows.Add(dataRow);

                strLine = streamReader.ReadLine();
            }

            streamReader.Close();
            return dt;
        }

        return null;
    }

Any help with this would be greatly appreciated as I am struggling to find answers on google.

Community
  • 1
  • 1
user971543
  • 53
  • 1
  • 5
  • 1
    Looks like the CSV is not valid. Fields that contain a comma should be escaped with quotes. – Oded Sep 29 '11 at 16:36
  • Can't you just use another delimeter in the csv? like semicolon – Oskar Kjellin Sep 29 '11 at 16:36
  • 1
    Not being able to change the data is a big bummer! This is the whole reason people use either different delimiters or they qualify each value with double quotes. – Josh Sep 29 '11 at 16:36

2 Answers2

1

I propose checking the array after the split. If you find it has N + 1 columns (where you expect N), merge the two City columns and shift the others down (strList[i] = strList[i+1]). Otherwise process as normal.

Of course this only works if you have only the one column that has a potential comma.

Bahri Gungor
  • 2,289
  • 15
  • 16
1

In addition to just checking the length of the split array as @Bahri suggests, if your data is predictable enough (as in your example), you could check column content.

If cost in your example is always a number, you could check to see if it contains only digits (or use a Regex for more complex matching). If not, then collapse the previous two columns.

Mark Peters
  • 17,205
  • 2
  • 21
  • 17