0

I have a table with around 100k rows and 40 columns.

I need to copy some of the rows to another workbook based an array with strings that match column values.

cond_list = ["value1", "value2", "value3" ...]

This condition can match 5k rows or more.

I tried a simple solution to use AutoFilter and copy visible cells:

' Filter source data
src_wks.ListObjects("Table1").Range.AutoFilter _
  Field:=src_wks.ListObjects("Table1").ListColumns("Column1").Index, _
  Criteria1:=cond_list, Operator:=xlFilterValues
        
' Copy and paste
src_wks.UsedRange.SpecialCells(xlCellTypeVisible).Copy
dst_wks.Range("A1").PasteSpecial Paste:=xlPasteValues

Filtering takes a fraction of a second, but then execution of this line takes more than 10 minutes. I have to run this code like 20 times so it is unacceptable.

src_wks.UsedRange.SpecialCells(xlCellTypeVisible).Copy

I tried to modify the code following this comment: https://stackoverflow.com/a/22789329/7214068

I tried to copy whole data first and then remove hidden rows:

' Copy and Paste whole table
dst_wks.UsedRange.Offset(1, 0).Value = ""
addr = src_wks.UsedRange.Address
dst_wks.Range(addr).Value = src_wks.UsedRange.Value

' Filter data
dst_wks.ListObjects("Table1").Range.AutoFilter _
  Field:=dst_wks.ListObjects("Table1").ListColumns("Column1").Index, _
  Criteria1:=cond_list, Operator:=xlFilterValues

' Remove rest
Application.DisplayAlerts = False ' Suppress "delete row?" promt
Dim i, numRows As Long
numRows = dst_wks.UsedRange.Rows.Count
For i = numRows To 1 Step -1
    If (dst_wks.Range("A" & i).EntireRow.Hidden = True) Then
        dst_wks.Range("A" & i).Delete
    End If
Next i
Application.DisplayAlerts = True

Copying whole data takes less than two seconds. But then it again hangs on for loop and takes more than 10 minutes.

Community
  • 1
  • 1
AnJ
  • 581
  • 1
  • 6
  • 29

3 Answers3

1

I am not sure about how your data looks like, but from my opinion, it is not efficienct to use filter. Here I will post a demo for your reference. Better to use SQL.

Sub filterProcess()
    Dim filterArray
    Dim conn As Object
    Set conn = CreateObject("adodb.connection")
    strPath = ThisWorkbook.FullName

    If Application.Version < 12 Then
        connString = "Provider=Microsoft.jet.OLEDB.4.0;Extended Properties=Excel 8.0;Data Source=" & strPath
    Else
        connString = "Provider=Microsoft.ACE.OLEDB.12.0; Extended Properties = 'Excel 12.0; HDR=YES;IMEX=0'; Data Source = " & strPath
    End If

    filterArray = Array("ta001", "01", "A")


    conn.Open connString
    strSQL = " SELECT * FROM [a$]  where [title1] = '" & filterArray(0) & "'" & " and [title2] = '" & filterArray(1) & "'" & "and [title3] = '" & filterArray(2) & "'"
    Set rst = conn.Execute(strSQL)
   Worksheets.Add

    For j = 0 To rst.Fields.Count - 1
        Cells(1, j + 1) = rst.Fields(j).Name

    Next

    ActiveSheet.Range("A2").CopyFromRecordset rst

    rst.Close
    conn.Close

    Set conn = Nothing


End Sub

enter image description here

Anabas
  • 346
  • 1
  • 7
1

An alternate approach (there are several ways to do this) could be to use a SQL statement to query the data from the sheet in question, then copy it to a new sheet. This might be preferable if the conditions for selecting data become more complex.

I have my data setup like this on Sheet1:

enter image description here

Code

Option Explicit
Private Const adCmdText As Long = 1
Private Const adStateOpen As Long = 1

Public Sub DisplayView(Conditions As String)
    Dim dbField       As Variant
    Dim fieldCounter  As Long
    Dim dbConnection  As Object
    Dim dbRecordset   As Object
    Dim dbCommand     As Object
    Dim OutputSheet   As Excel.Worksheet

    Set dbConnection = CreateObject("ADODB.Connection")
    Set dbRecordset = CreateObject("ADODB.Recordset")
    Set dbCommand = CreateObject("ADODB.Command")

    Set OutputSheet = ThisWorkbook.Worksheets("Sheet2")

    'Do a quick check to determine the correct connection string
    'if one of these don't work, have a look here --> https://www.connectionstrings.com/excel/
    If Left$(ThisWorkbook.FullName, 4) = "xlsm" Then
        dbConnection.connectionstring = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & _
        ThisWorkbook.FullName & ";Extended Properties='Excel 12.0 Macro;HDR=YES';"
    Else
        dbConnection.connectionstring = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & _
        ThisWorkbook.FullName & ";Extended Properties='Excel 12.0;HDR=YES';"
    End If

    'Open the connection and query
    dbConnection.Open
    With dbCommand
        .ActiveConnection = dbConnection
        .CommandType = adCmdText
        .CommandText = "Select * from [Sheet1$] where Column1 in (" & Conditions & ")" 'Update Sheet where applicable
        Set dbRecordset = .Execute
    End With

    'Clear the Output Sheet
    OutputSheet.Cells.Clear

    'Add Headers to output
    For Each dbField In dbRecordset.Fields
        fieldCounter = fieldCounter + 1
        OutputSheet.Cells(1, fieldCounter).Value2 = dbField.Name
    Next

    'Dump the found records
    OutputSheet.Range("A2").CopyFromRecordset dbRecordset
    If dbConnection.State = adStateOpen Then dbConnection.Close
End Sub

'Run from here
Public Sub ExampleRunner()
    Dim t As Double
    t = Timer
    DisplayView "'value1','value2','value3'" 'Send it a quoted csv of values you are looking for
    Debug.Print "Getting data took: " & Timer - t & " seconds"
End Sub

This is taking about 4-5 seconds on my machine to pull back a few thousand records from a total data set size of 100,000.

Ryan Wildry
  • 5,612
  • 1
  • 15
  • 35
  • Thank you very much. It is working! But now I thought that it would be cleaner to also replace conditions list with a SQL query. I could simply write nested SQL query instead of constructing this list "by hand" in a for loop. But when I do something like this: `Select * From [wksA$] Where [Column1] in (Select [Column1] from [wksB$] where [Column1] <> '')` I get type mismatch - even if result of the nested query is a list of strings. Maybe you know if this is something related to excel or just I'm bad at SQL? – AnJ Apr 14 '20 at 15:01
  • 1
    Looks ok, just make sure your sheet and columns exist. To debug, try executing your inner select as a separate recordset to see what you get back. You might just need to dump to a string instead. How is the speed? – Ryan Wildry Apr 14 '20 at 15:07
  • I did run inner select and it returns only one column with string values as expected. Speed it great, from around 3 seconds to 10 seconds. Worst case scenario with huge conditions list it takes 30 seconds but it is still ultra fast. – AnJ Apr 14 '20 at 15:54
  • for better speed, you can try dict. – Anabas Apr 14 '20 at 20:46
0

you could try :

  • the technique from accepted solution of the SO question you linked

    i.e.: loop through Areas and work with Value properties

  • reference src_wks.ListObjects("Table1").Range also for copy/paste values operation

as follows:

Dim area As Range
With src_wks.ListObjects("Table1").Range ' reference your table Range
    ' Filter referenced range
    .AutoFilter _
        Field:=src_wks.ListObjects("Table1").ListColumns("Column1").Index, _
        Criteria1:=cond_list, Operator:=xlFilterValues

    ' Copy and paste values from each single referenced range "visible" area
    For Each area In .SpecialCells(xlCellTypeVisible).Areas
        With area
            dst_wks.Cells(Rows.Count, 1).End(xlUp).Offset(1).Resize(.Rows.Count, .Columns.Count).Value = .Value
        End With
    Next
End With

and if you could also Sort your table, that could dramatically speed it up more

HTH
  • 2,031
  • 1
  • 4
  • 10
  • Should `dst_wks.Cells(Rows.Count...` be `dst_wks.Cells(.Rows.Count...`? – Ryan Wildry Apr 14 '20 at 12:15
  • 1
    @RyanWildry, No. Since, given the `With src_wks.ListObjects("Table1").Range … End With` it's in, it would reference the referenced range `.Rows.Count`. It should more appropriately be `dst_wks.Cells(dst_wks.Rows.Count…`, but I assumed (at my own risk…) that the source and destination sheet were in the same workbook, or, at least, they were in different workbook of the same Excel version so as to have the same maximum rows number – HTH Apr 14 '20 at 12:29
  • Hmm, so it will reference the active sheet? – Ryan Wildry Apr 14 '20 at 12:29
  • @RyanWildry, yes. But to the only purpose of returning its maximum rows number, which is the same throughout all the workbook worksheets. And which would be the same of any other workbook worksheets provided they are of the same Excel version – HTH Apr 14 '20 at 12:30
  • @RyanWildry yeah, I thought the same but without the dot this code put everything in the first row. – AnJ Apr 14 '20 at 12:31
  • @HTH sadly it still takes forever. I'm trying to sort my table but without success. – AnJ Apr 14 '20 at 12:32
  • what does _"without"_ success actually mean? – HTH Apr 14 '20 at 12:32
  • @Gumaa, _" at RyanWildry yeah, I thought the same but without the dot this code put everything in the first row. "_. You must be referring to the second part of the same statement (i.e. `... .Resize(.Rows.Count, .Columns.Count).Value`), while Ryan was referring to its first part. As for the latter I already gave explanations. As for the former, those dots must be there because they are referriing to current `area` in order to appropriately size the destination range – HTH Apr 14 '20 at 12:36
  • @HTH I was referring the same part as Ryan but I misspoke so my bad. I meant "with the dot" not "without". I was debugging it and it seemed like in that case (with a dot) everything was written to only first row. – AnJ Apr 14 '20 at 12:44
  • And about the sorting I have trouble defining CustomList used for sorting. `Application.AddCustomList ListArray:=cond_list` throws `method 'addcustomlist' of object '_application' failed`. But I will try to google something more. – AnJ Apr 14 '20 at 12:45
  • @Gumaa, as for your second to last comment: ok. And I already explained the reason why the dot must not be there, hence my code didn't have it. – HTH Apr 14 '20 at 12:50
  • 1
    @Gumaa, as for your last comment, you'd better take a look at the other answers here using SQL first, which are promising to be very fast – HTH Apr 14 '20 at 12:51