3

I am trying to import a large CSV file, where I am dumping each row of the input csv file into an array (vector), which is NumColumns long. I fetched some code to copy a list to a DataTable, however, I am not sure the IList (IsEnumerable?) is needed. I also haven't looked into what T is.

My gut feeling is that I can go to some other code I have to load a DataTable with row and column data from a 2-dimensional array x(,), but for some reason I think there may be a fast way to simply .add(x), i.e. add the entire row vector to the DataTable to keep the speed up. You don't want to loop through columns(?)

Below is the code which will open up any .csv.

Imports System.ComponentModel
Imports System.IO

Public Class Form1
    Dim NumColumns As Integer
    Dim ColumnNames() As String

    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
        Dim filename As String = Nothing
        With OpenFileDialog1
            .FileName = "*.csv"
            .CheckFileExists = True
            .ShowReadOnly = True
            .Filter = "Comma delimited *.csv|*.csv"
            If .ShowDialog = DialogResult.OK Then
               filename = .FileName
            End If
        End With
        Dim csvreader As New StreamReader(filename)
        Dim inputLine As String = ""
        inputLine = csvreader.ReadLine()
        Dim buff() As String = Split(inputLine, ",")
        NumColumns = UBound(buff)
        ReDim ColumnNames(UBound(buff) + 1)
        For j As Integer = 0 To NumColumns
            ColumnNames(j + 1) = buff(j)
        Next
        inputLine = csvreader.ReadLine()
        Do While inputLine IsNot Nothing
            Dim rowdata = New MyDataArray(NumColumns)
            Dim csvArray() As String = Split(inputLine, ",")
            For i As Integer = 0 To NumColumns
                rowdata.x(i) = csvArray(i)
            Next
            MyDataArray.DataArray.Add(rowdata)
            inputLine = csvreader.ReadLine()
        Loop
        Dim dgv As New DataGridView
        dgv.DataSource = ToDataTable(MyDataArray.DataArray)
        dgv.Width = 1000
        dgv.Height = 1000
        Me.Controls.Add(dgv)
    End Sub

    Public Shared Function ToDataTable(Of T)(data As IList(Of T)) As DataTable
        Dim properties As PropertyDescriptorCollection = TypeDescriptor.GetProperties(GetType(T))
        Dim dt As New DataTable()
        For i As Integer = 0 To properties.Count - 1
            Dim [property] As PropertyDescriptor = properties(i)
            dt.Columns.Add([property].Name, [property].PropertyType)
        Next
        Dim values As Object() = New Object(properties.Count - 1) {}
        For Each item As T In data
            For i As Integer = 0 To values.Length - 1
                values(i) = properties(i).GetValue(item)
            Next
            dt.Rows.Add(values(1))
        Next
        Return dt
    End Function
End Class

Public Class MyDataArray
    Public Shared DataArray As New List(Of MyDataArray)()
    Public Property x() As Object
    Sub New(ByVal cols As Integer)
        ReDim x(cols)
    End Sub
End Class
  • How many lines in the csv? – djv Mar 30 '18 at 20:34
  • The experimental csv has about 500, but the upper bound would be millions, certainly much greater than Excel. –  Mar 30 '18 at 20:42
  • A problem I see is that the x() array I fill with each row's data is the only Property in the MyDataArray Class. I am thinking that I need to dynamically add a new property for each column, but somehow think there's a way to throw an entire array into a row of a DataTable(?) –  Mar 30 '18 at 20:45
  • Maybe, in which case you would want to read the entire file first, then load into the datatable. I would do that in either case. – djv Mar 30 '18 at 20:52
  • 1
    OLEDB can read CSVs into a DT and you can also tell it the column types so that there is no need for parsing or an intermediary step. Libraries like CSVHelper if awesome for getting CSVs into collections in a very, very economical way – Ňɏssa Pøngjǣrdenlarp Mar 30 '18 at 20:59
  • That sounds like a good idea: fly through the csv to get the list, then move the list to the DataTable. Should I add a Property for each ColumnName dynamically, so that the ToDataTable function can see the properties? –  Mar 30 '18 at 20:59
  • @Plutonix, thanks, but another post stated that CSVHelper is not needed, and the .NET method scales well without chewing up resources. See e.g. (https://stackoverflow.com/questions/44362395/how-to-read-huge-csv-file-with-29-million-rows-of-data-using-net) –  Mar 30 '18 at 21:02
  • 1
    There are a very great number of people who would find fault with the linked answer on many grounds. First, parsing all but the simplest CSV yourself is a bad idea for many reasons - un typed data in your approach is one. The code to move data from a list to a DataTable is silly cause if that is the destination, you can read it into a (typed!) DT ***directly*** using OleDB . Second, the reason that the linked asker was having trouble was because s/he was trying to load all the records en masse. Used judiciously, CSV helper loads, read and parses one row/record as you need it. – Ňɏssa Pøngjǣrdenlarp Mar 31 '18 at 01:47
  • CSV Helper is nice and I've had really good experience using a library called the Generic Parser: https://www.codeproject.com/Articles/11698/A-Portable-and-Efficient-Generic-Parser-for-Flat-F – NoAlias Apr 05 '18 at 13:14
  • @wrtsvkrfm are you saving the data to DB? You can read the file using CSVHelper, create an XML in the VB code. Then pass the xml to the stored procedure. Use xquery in the stored procedure to insert data into the database. I did this with 80k lines of csv. Took about 2 minutes in all. – novice_dev Apr 05 '18 at 15:17
  • You may have valid reasons for wanting both a 2D record array and a DataTable, but I can not think of one. Please explain this need. – TnTinMn Apr 05 '18 at 17:02
  • "How to load millions of rows from a CSV into a Winforms DataGridView?" Is this the real question? – Simon Mourier Apr 06 '18 at 06:31
  • 1
    It should be noted that many DB provide the means to import directly from a disk CSV into the DB. As is, the code in the post is doing most everything in the hardest and most inefficient method possible. – Ňɏssa Pøngjǣrdenlarp Apr 10 '18 at 03:57
  • OleDB is very capable of handling this. see example form @adatapost https://stackoverflow.com/a/11120351/3989439 – Steven de Beer Apr 11 '18 at 21:01

2 Answers2

1

Maybe this code will help? You can use OleDB to convert the CSV to a Database and then put it into a datatable. All you need is a DataGridView and form (If you want to print it). Then you can use teh code below to accomplish what you need to do.

Public Class Form1
    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
        Dim file As String = "test.txt"
        Dim path As String = "C:\Test\"
        Dim ds As New DataSet
        Try
            If IO.File.Exists(IO.Path.Combine(path, file)) Then
                Dim ConStr As String = _
                "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & _
                path & ";Extended Properties=""Text;HDR=No;FMT=Delimited\"""
                Dim conn As New OleDb.OleDbConnection(ConStr)
                Dim da As New OleDb.OleDbDataAdapter("Select * from " & _
                file, conn)
                da.Fill(ds, "TextFile")
            End If
        Catch ex As Exception
            MessageBox.Show(ex.ToString)
        End Try
        DataGridView1.DataSource = ds.Tables(0)
    End Sub
End Class

However, you could always convert it to an xml and work from there

Sharad Khanna
  • 352
  • 3
  • 11
0
     Imports System.IO
Module Module3
    Public Function _simpleCSV2tbl(CSVfile As String) As DataTable
        Dim watch As Stopwatch = Stopwatch.StartNew()
        watch.Start()
        'A,B,C,D,E
        '00001,4,1,2,3560
        '00002,4,12,1,2000
        '00003,1,4,2,4500
        '00004,4,12,1,2538.63
        '00005,1,1,2,3400
        '00006,2,5,2,2996.48

        Dim dTable As New DataTable(CSVfile)
        Using reader As New StreamReader(CSVfile)

            Dim CSV1stLine As String = reader.ReadLine
            Dim getCols = (From s In CSV1stLine.Split(",") Select s).ToList()
            Dim setTblColumns = (From c In getCols Select dTable.Columns.Add(c, GetType(String))).ToList

            Dim ReadToEnd As String = reader.ReadToEnd()
            Dim getRows = (From s In ReadToEnd.Split(vbLf) Select s).ToArray

            _setTblRows(getRows, dTable)

            Console.WriteLine(String.Format("Elapsed: {0}", Format(watch.Elapsed.TotalMilliseconds, "F"), dTable.Rows.Count))

            reader.Close()
            reader.Dispose()
        End Using

        _ShowTbl(dTable, 10)

    End Function

    Public Function _setTblRows(getRows As String(), dTable As DataTable) As IEnumerable
        _setTblRows = getRows.Select(Function(r) dTable.LoadDataRow(r.Split(","), False)).ToArray()
    End Function
    Public Sub _ShowTbl(ByVal dTable As DataTable, Optional ByVal rPad As Short = 0)
        If dTable.TableName = Nothing Then dTable.TableName = "NoName"
        If rPad = 0 Then
            Console.WriteLine(String.Format("->Unformatted Table: {0}, Count={1}", dTable.TableName, dTable.Rows.Count))
        Else
            Console.WriteLine(String.Format("->Formatted Table: {0}, Count={1}", dTable.TableName, dTable.Rows.Count))
        End If

        _ShowTblColumns(dTable.Columns, rPad)
        _ShowTblRows(dTable.Rows, rPad)

    End Sub
    Public Function _ShowTblColumns(ByVal TblColumns As DataColumnCollection, Optional ByVal rPad As Short = 0)
        Dim getTblColumns = (From c As DataColumn In TblColumns Select c.ColumnName).ToList()
        Console.WriteLine(String.Join(",", getTblColumns.Select(Function(s) String.Format(s.PadLeft(rPad, vbNullChar)).ToString).ToArray))
    End Function
    Public Function _ShowTblRow(ByVal Row As DataRow, Optional ByVal rPad As Short = 0)
        Dim getRowFields = (From r In Row.ItemArray Select r).ToList
        Console.WriteLine(String.Join(",", getRowFields.Select(Function(s) String.Format(s.PadLeft(rPad, vbNullChar)).ToString).ToArray))
    End Function
    Public Function _ShowTblRows(ByVal Rows As DataRowCollection, Optional ByVal rPad As Short = 0)
        Dim rCount As Integer
        For Each row As DataRow In Rows

            _ShowTblRow(row, rPad)

            rCount += 1
            If rCount Mod 20 = 0 Then
                If NewEscape(String.Format(" {0} out of {1}", rCount.ToString, (Rows.Count).ToString)) Then Exit Function
            End If
        Next row
        Console.WriteLine()
    End Function

    Public Function _NewEscape(ByVal Message As String) As Boolean
        Console.Write("{0} / Press any key to continue or Esc to quit {1}", Message, vbCrLf)
        Dim rChar As Char = Console.ReadKey(True).KeyChar
        Dim rEscape As Boolean
        If rChar = Chr(27) Then rEscape = True
        Return rEscape
    End Function

End Module
user3105453
  • 1,881
  • 5
  • 32
  • 55
ATeDe
  • 11
  • 6
  • 1
    Presented above code compiles and runs under Visual Studio 17. rPad in function _ShowTbl(dTable, rPad) is Optional. When is omitted, printout is unformatted. – ATeDe Apr 18 '18 at 01:02
  • Thanks - looks quite fast, will need to give it a try. Thx! –  Apr 19 '18 at 01:40
  • 1
    This simple module works several times faster than TextFieldParser and other mentioned in this post ready to use utilitities. With a huge and extremely large CSV files, only a custom made functions can deal in fully controlled manner... – ATeDe Apr 19 '18 at 10:45
  • 1
    Just to conclude this topic I'd like to bring your attention that reading your CSV data to List using LINQ would be even more faster. DataTable is somehow more convenient , because once you define your column Table.Columns.Add(Name, GetType, eventually everything is under control. When comes to List I found a really very smart article http://www.obelink.com/index.php/2016/04/11/query-a-csv-file-with-linq-and-visual-basic-net/ which I recomend read. If you you combine List with Dictionary actually I think the final effect would be the same, but it gonna be still faster than using DataTbl – ATeDe Apr 20 '18 at 11:55