1

I have a function that reads CSV files and build a datatable. I won't claim it is my own coz it is not. I don't remember where I got it but it is combination from few different sources.

My function works fine until I need to process some CSVs which has "," commas in the value. Can anyone help me to solve this out to handle this issue ?

Thanks a lot ..

Sample CSV File

    FirstName, LastName, Comment, "address, just city",
    John, Smith, "you are very good, but ugly", London,
    Britney, Spear, "I am a singer, and beautiful", New York,

My function

    Public Function BuildDataTable() As DataTable

    Dim myTable As DataTable = New DataTable("MyTable")
    Dim i As Integer
    Dim myRow As DataRow
    Dim fieldValues As String()
    Dim myReader As IO.StreamReader
    Dim csv2xml As New csv2xml

    Try
        'Open file and read first line to determine how many fields there are.
        myReader = File.OpenText(_fileFullPath)
        fieldValues = myReader.ReadLine().Split(_seperator)
        'Create data columns accordingly
        If _hasheader = False Then
            For i = 0 To fieldValues.Length() - 1
                myTable.Columns.Add(New DataColumn("Column(" & i & ")"))
            Next
        Else
            'if the file has header, take the first row as header for datatable
            For i = 0 To fieldValues.Length() - 1
                myTable.Columns.Add(New DataColumn(fieldValues(i).Replace(" ", "")))
            Next
        End If

        'Adding the first line of data to data table
        myRow = myTable.NewRow

        'if the csv file has not got a column header. defined by radio button list on first page by user
        'if csv file has header, then not need to read the first line
        If _hasheader = False Then
            For i = 0 To fieldValues.Length() - 1
                myRow.Item(i) = fieldValues(i).ToString
            Next
            myTable.Rows.Add(myRow)
        End If

        'Now reading the rest of the data to data table
        While myReader.Peek() <> -1
            fieldValues = myReader.ReadLine().Split(_seperator)
            myRow = myTable.NewRow
            For i = 0 To fieldValues.Length() - 1
                myRow.Item(i) = fieldValues(i).Trim.ToString
            Next
            'check if there are empty rows in csv, ignore empty rows

            If Not csv2xml.AreAllColumnsEmpty(myRow) = True Then
                myTable.Rows.Add(myRow)
            End If

        End While
    Catch ex As Exception
        'MsgBox("Error building datatable: " & ex.Message)
        Dim oError As ErrorLog = New ErrorLog
        oError.LogError(_strWebsiteName, _
            loginID, _
            ex.Source.ToString, _
            ex.Message.ToString, _
            , _
            ex.StackTrace.ToString)
        oError = Nothing
        Return New DataTable("Empty")
        'Server.Transfer(CustomErrorPage)
    Finally
        csv2xml = Nothing
        myRow = Nothing
    End Try
    myReader.Close()
    Return myTable

    End Function
Laurence
  • 7,633
  • 21
  • 78
  • 129
  • 4
    Look at this question/answers http://stackoverflow.com/questions/1050112/how-to-read-a-csv-file-into-a-net-datatable?rq=1 – Steve Jul 05 '12 at 11:05
  • You seem to have accidentally omitted the part of your code that parses out the rows and fields from the file. It would seem to me that it's that part of the code that would be most pertinent to your question. – Steven Doggart Jul 05 '12 at 12:46

2 Answers2

6

Just use the VB TextFieldParser class, and set the HasFieldsEnclosedInQuotes property to True:

ssis_ssiSucks
  • 1,476
  • 1
  • 12
  • 11
2

Your problem is that you seem to be parsing the columns by simply splitting each row string by the comma as a delimiter. That method will not work on all CSV files because, if it is a string that contains commas, as you have shown, the whole field value will be surrounded by quotes. So you need to first split by quotes and then by commas on what's left, and remove the quotes from the value of the field, which is a substantially more complex algorithm. You could write something like that yourself, or use regex to do it, but at that point, I would think you'd be better off just using ADO to load the CSV file:

Public Function LoadCsvFile(filePath As String, hasHeader As Boolean) As DataTable
    Dim folderPath As String = Path.GetDirectoryName(filePath)
    Dim fileName As String = Path.GetFileName(filePath)
    Dim hdr As String = Nothing
    If hasHeader Then
        hdr = "Yes"
    Else
        hdr = "No"
    End If
    Dim connectionString As String = String.Format("Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq={0};Extended Properties=""Text;HDR={1};FMT=Delimited""", folderPath, hdr)
    Dim connection As New OdbcConnection(connectionString)
    Dim adapter As New OdbcDataAdapter("SELECT * FROM [" + fileName + "]", connection)
    Dim table As New DataTable()
    adapter.Fill(table)
    Return table
End Function
Steven Doggart
  • 43,358
  • 8
  • 68
  • 105