1

VBA Newbie:- I have large csv file . I would like to read only 5000 lines from it and import it into my excel using VBA.

I tried following code . It opens the file but I cant seem to find a way to import only 5000 rows

Sub importcsvfile()
    Dim WS As Worksheet, strFile As String
    Set WS = ActiveWorkbook.Sheets("sheet1") 'set to current worksheet name

    'Open .csv file'
    strFile = Application.GetOpenFilename("Text Files (*.csv),*.csv", , "Please select text file...")

    With WS.QueryTables.Add(Connection:="TEXT;" & strFile, Destination:=WS.Range("A1"))
     .TextFileParseType = xlDelimited
     .TextFileCommaDelimiter = True
     .Refresh
    End With

    ' delete the querytable if there is one
    On Error GoTo nothingtodelete
    Sheets("Data").QueryTables(1).SaveData = False
    Sheets("Data").QueryTables.Item(1).Delete
    nothingtodelete:
End Sub
Code reference - https://stackoverflow.com/questions/12197274/is-there-a-way-to-import-data-from-csv-to-active-excel-sheet

Also, How do I save this file in .xlsm using VBA code .. Any help is appreciated!

BigBen
  • 46,229
  • 7
  • 24
  • 40
DiyaShetty
  • 95
  • 1
  • 9

1 Answers1

1

I do not know if QueryTable method can be limited to a specific number of rows. But please, try this piece of code, instead. It should work and be fast enough:

Private Sub importcsvfile() 
    Dim WS As Worksheet, strFile As String, arrCSV, cols, dataCSV
    Dim i As Long, nL As Long, c As Long, nrRows As Long, strAll As String
    Dim st As Long, lEnd As Long
    
    Set WS = ActiveSheet
    nrRows = 5000
    
    'Open .csv file'
    strFile = Application.GetOpenFilename("Text Files (*.csv),*.csv", , "Please select text file...")

    If strFile = "False" Then Exit Sub
    
    arrCSV = Split(CreateObject("Scripting.FileSystemObject").OpenTextFile(strFile, 1).ReadAll, vbLf) 'more than three minutes for this line...
    cols = Split(arrCSV(0), ",")
    ReDim dataCSV(0 To nrRows - 1, 0 To UBound(cols))
         
    nL = 0
    For i = 0 To nrRows - 1
        st = InStr(1, arrCSV(i), """"): lEnd = InStr(st + 1, arrCSV(i), """")
        If st > 0 Then
            arrCSV(i) = Replace(arrCSV(i), Mid(arrCSV(i), st, lEnd - st + 1), _
                        Replace(Mid(arrCSV(i), st + 1, lEnd - st - 1), ",", "/"))
        End If
        cols = Split(arrCSV(i), ",")
        For c = 0 To UBound(cols)
            dataCSV(nL, c) = cols(c)
        Next
        nL = nL + 1
    Next i
    
    WS.cells.Clear
    WS.Range("A1").Resize(nrRows, UBound(dataCSV, 2) + 1).Value = dataCSV
End Sub

Edited:

Please, check the next code which does not read the whole file string at once. I couldn't imagine that it is really huge... This version is very fast. It takes seconds. I will also let the first version only for learning reason. This should be the main purpose of our community, I think:

Private Sub importcsvfileRLines()
    Dim WS As Worksheet, strFile As String, arrCSV, cols, dataCSV
    Dim i As Long, nL As Long, c As Long, nrRows As Long, strAll As String
    Dim st As Long, lEnd As Long, myCSV As Object, txtLine As String
    
    Set WS = ActiveSheet 'use here the seet you need
    nrRows = 5000        'set here the number of rows to be returned
    
    'Open .csv file'
    strFile = Application.GetOpenFilename("Text Files (*.csv),*.csv", , "Please select text file...")

    If strFile = "False" Then Exit Sub  'in case of pressing 'Cancel' button
    
    nL = 1 'initialize the first (array) row tot be filled
    ReDim dataCSV(1 To nrRows, 1 To 11)      'redim the necessary array (for big speed)
    Set myCSV = CreateObject("Scripting.FileSystemObject").OpenTextFile(strFile, 1)
        Do While myCSV.AtEndOfStream <> True 'iterate betweeb all existing lines
            txtLine = myCSV.ReadLine         'put the text line in a variable
            'solve the string problem, having comma delimiter between the double quotes:
            st = InStr(1, txtLine, """")     'position of the first double quotes character
            If st > 0 Then
                lEnd = InStr(st + 1, txtLine, """") 'position of the last double quotes character
                txtLine = Replace(txtLine, Mid(txtLine, st, lEnd - st + 1), _
                            Replace(Mid(txtLine, st + 1, lEnd - st - 1), ",", "/"))
            End If
            cols = Split(txtLine, ",")       'split the text on comma separator
            For c = 0 To UBound(cols)
                    dataCSV(nL, c + 1) = cols(c) 'fill the array nL row
            Next
            nL = nL + 1
            If nL = nrRows + 1 Then Exit Do      'if max set number of rows has been reached
        Loop
    myCSV.Close
    
    WS.cells.Clear 'clear the old data
    'drop the array value at once:
    WS.Range("A1").Resize(nrRows, UBound(dataCSV, 2)).Value = dataCSV
    MsgBox "Ready...", vbInformation, "Job finished"
End Sub
FaneDuru
  • 38,298
  • 4
  • 19
  • 27
  • @DiyaShetty: Didn't you find some time to test the above code? If tested, doesn't it do what you need? – FaneDuru Jan 06 '21 at 07:32
  • Thank you and I appreciate your help ! It looks like csv file does not want me to import its contents. Above code is freezing my excel workbook. – DiyaShetty Jan 06 '21 at 15:44
  • @DiyaShetty: Please, try running the above code line by line, pressing F8 (in VBE) and see where is the problem. Does the code reach the iteration area? I mean, after the code line 'For i = 0 To nrRows - 1`. I tested the code and worked very well, but I do not have such a large file. How many columns does the csv file in discussion have? It maybe needs a lot of resources for such an iteration. But please, try what I recommended before and let me know what is happening. – FaneDuru Jan 06 '21 at 17:53
  • Just for testing sake , https://www.kaggle.com/arashnic/covid19-case-surveillance-public-use-dataset . Could you please use this file ? I know I will be losing data but I only want first 5000 rows. Every time I try to run the code, My excel freezes just after I select the file. Wondering if it has something to do with extremely large file. – DiyaShetty Jan 07 '21 at 14:58
  • @DiyaShetty: I tried downloading it, but they do not recognize my FaceBook account mail even being logged in, then they ask for my login data in case of mail, it does not look so safe... Can you put it on a transfer site? [This one](https://easyupload.io/) is free and easy to be used... Now I will leave my office. I will look at it when I will be home. – FaneDuru Jan 07 '21 at 15:05
  • @DiyaShetty: OK. I will download it and make some tests. – FaneDuru Jan 08 '21 at 09:05
  • @DiyaShetty: There are some issues: The file is **huge** and reading/splitting it takes about three minutes (on my laptop). Then, the file line separator is not `VbCrLf` like normally, it is `VbLf`. Even if the csv file uses comma separator, in the column "Race and ethnicity" there are comma separated strings (like Black, Non-hispanic). The splitting based on comma, splits it too. I processed the string in a way to replace it with "/". the `QueryTable` metod splits it based on format. Now, it works. The rest of processing for extracting 5000 rows takes 6 to 7 seconds... I will adapt the code. – FaneDuru Jan 08 '21 at 09:50
  • @DiyaShetty: But, since `QueryTable` is able to fill all the sheet rows in less time (meaning that the real file rows number is much bigger) I will post another variant, also, reading line by line. Theoretically, it should take seconds... – FaneDuru Jan 08 '21 at 10:30
  • @DiyaShetty: Please, check the new version. It is really very fast... – FaneDuru Jan 08 '21 at 10:41
  • @FaneDaru: Works like a charm! Thank you so much !! It's really fast and I will be using this code in my future projects .. Thanks again! – DiyaShetty Jan 08 '21 at 22:41
  • 1
    @DiyaShetty: Glad I could help! I also learn that the reading line by line may be (much) faster. I never in my life tried processing such a huge CSV file. And I dealt with a lot of them... Theoretically, processing the string in memory it is much faster than reading and processing each line. Except the case when the file content is huge and its loading in the array takes a lot, like it happened in this specific case... And, of course, you need only a part of it. If need it all, the initial way is faster. It has much more lines than excel sheet max rows number, but `QueryTable` only fills sheet – FaneDuru Jan 09 '21 at 11:28