0

I have a large number of rows and columns (e.g. 500k rows and 20 columns) all filled with numbers.

I'm trying to delete all data in column I that has a certain value (e.g. less than or equal to 8), but when I try to use autofilter to delete the values, it freezes up Excel and doesn't delete.

It works quickly for data in column A. I remade similar data in a new sheet to make sure all cells were filled, no columns/rows were hidden etc.

Why is it freezing up for column I?

Sub DeleteRow()

    Application.DisplayAlerts = False
    Application.Calculation = xlCalculationManual
    Application.ScreenUpdating = False

    Dim ws As Worksheet
    Dim rng As Range
    Dim lastRow As Long

    Set ws = ActiveWorkbook.Sheets("Sheet1")

    'filter and delete all but header row which is in row 3
    lastRow = ws.Range("I" & ws.Rows.count).End(xlUp).row
    MsgBox lastRow
    Set rng = ws.Range("I3:I" & lastRow)

    ' filter and delete all but header row
    With rng
         .AutoFilter Field:=1, Criteria1:="<=8"
         .SpecialCells(xlCellTypeVisible).EntireRow.Delete
    End With

    ' turn off the filters
    If ActiveSheet.FilterMode Then
        ActiveSheet.ShowAllData
    End If

    Application.DisplayAlerts = True
    Application.Calculation = xlCalculationAutomatic
    Application.ScreenUpdating = True
End Sub
Community
  • 1
  • 1
newtovba
  • 33
  • 6
  • Of the 500k rows, approximately how many would you expect to be deleted? – chris neilsen Oct 19 '19 at 06:15
  • 1
    The fastest way to delete that amount of data, by my experience, is create an auxiliar column that will report blank or "" if the row should be deleted and any other non blank value if you want to keep it, and then use a loop with `Cells(Counter, Aux_Column).SpecialCells(xlCellTypeBlanks).EntireRow.Delete` that will delete all the blank cells rows. – David García Bodego Oct 19 '19 at 06:19
  • @chrisneilsen - I expect around 130k rows to be deleted from column I. Deletion of column A (~80k rows) gets deleted within a few seconds whereas column I freezes for over 20+ minutes – newtovba Oct 19 '19 at 06:32
  • 1
    Check this [post](https://stackoverflow.com/questions/57651919/optimization-on-row-deletion). On that file, with 6400 rows: xlCellTypeBlanks took 53 seconds, filtering 74 seconds, matrix 94 seconds and looping by cell, as your proposal, 64 seconds. Depends on your computer, you can reduce that times, but in order to delete, the time consumed by strategy is: Matrix > Filter > Loop /Delete by cell > Delete by blank. Delete by blank it is the fastest. – David García Bodego Oct 19 '19 at 06:33
  • 1
    @newtovba there are a lot of posts on SO about deleting rows, some good, some not so good. Two common ones are the Autofilter (which you are using) and building a range with Union (one of which David has linked you to). For a data set of this size and this many deletions, you may find Union method not that fast either. The reason is that Union doesn't scale well, as the Union'ed range gets bigger (to be precise, as the number of non contiguous Union'ed ranges gets bigger) the time taken to add each extra row increases. So by the time you get to the 130k'th row the Union will be very slow – chris neilsen Oct 19 '19 at 06:58
  • @DavidGarcíaBodego Thanks for the tip! I'm currently doing a timer for my data right now to see how long it takes. Can I use a table and do the excel formula (not vba) =if(i3<=8,"",i3) to use with Mikku method 1 in your post? – newtovba Oct 19 '19 at 07:02
  • ... a solution is to keep track of how many rows you've Union'ed and delete them when you get to a threshold. I find about 1000 works well enough – chris neilsen Oct 19 '19 at 07:03
  • @chrisneilsen Very good point - I've tried a lot of the different SO methods and all seemed to freeze up my computer on column I but not for column A which made me think it was an error in my data in column I, but when I put dummy values in a new spreadsheet, it would also freeze up – newtovba Oct 19 '19 at 07:03
  • I just realized that based on super rough scaling of @DavidGarcíaBodego's run time, my code should take about 75+ minutes to run on a good day for my amount of data, so perhaps it's actually working but has to delete too many rows. I'll post back in about an hour – newtovba Oct 19 '19 at 07:10
  • @newtovba... Use it. From my experience, it is much better to use a formula that introduce in the loop. It will freeze it. – David García Bodego Oct 19 '19 at 07:10
  • 1
    By percentage... how many columns you need to delete? Because maybe it will be faster just place the filter, copy the visible data to an auxiliar sheet, delete the original sheet and rename the auxiliar sheet. – David García Bodego Oct 19 '19 at 07:12
  • @DavidGarcíaBodego Got it. Will try it when this current execution runs. Oh that's a good point - I hadn't thought of that. Roughly 1/3 of the data needs to be deleted – newtovba Oct 19 '19 at 07:14
  • Fun fact - took 1336.98 seconds for the autofilter method to delete all the rows. I need to log off soon, so I'll try these two methods within the next 24 hours: 1) use the Mikku Method 1 later and 2) copy the visible data to another sheet to see how long those take – newtovba Oct 19 '19 at 07:17
  • 1
    Here's another method that [may be of interest](https://stackoverflow.com/a/14245591/445425) – chris neilsen Oct 19 '19 at 07:45
  • FYI I ran a optimised version of the Union method. It ran in only slighlty better time than the AutoFilter method for this large a data set – chris neilsen Oct 19 '19 at 08:16
  • @newtovba I know you are asking for vba code, but for this amount of data, you may take a look at power query – Ricardo Diaz Oct 19 '19 at 11:44
  • @RicardoDiaz Agreed - hadn't thought about that. Need this a bit more urgently and haven't learned power query (yet) so looking for a VBA solution and will probably look into power query later on when I do more of this stuff – newtovba Oct 19 '19 at 12:22

4 Answers4

4

There are a lot of posts on SO about deleting rows, some good, some not so good.

Two common ones are the Autofilter (which you are using) and building a range with Union (one of which David has linked you to).

For a data set of this size and this many deletions, you will find any method that uses references to Excel worksheet methods (such as AutoFilter, Find, Sort, Union, Formula's etc) slow. Some will be better than others, depending on the exact nature of your data.

There is another method that may work for you. That is to not actually Delete the rows, but to overwrite the data with a modified version.

Note that this only work if you DO NOT have any formulas (either on this sheet or any other) that refer to individual cells in the data being processed (whole column references should be OK, but YMMV)

I ran this code on a sample data set 500k rows, 20 columns of random numbers 1..32 (so about 25% or rows deleted)

This ran in ~10s

Sub DeleteRows2()
    Dim ws As Worksheet
    Dim rng As Range
    Dim i As Long, j As Long
    Dim NewI As Long
    Dim dat, NewDat
    
    Dim TestCol As Long
    Dim Threashold As Long
    Dim LastRow  As Long, LastCol As Long
    Dim t1 As Single, t2 As Single
    
    t1 = Timer()
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual
    
    TestCol = 9
    Threashold = 8
    
    Set ws = Sheet1
    With ws
        Set rng = .Range(.Cells(.Rows.Count, 1).End(xlUp), .Cells(1, .Columns.Count).End(xlToLeft))
    End With
    dat = rng.Value2
    ReDim NewDat(1 To UBound(dat, 1), 1 To UBound(dat, 2))
    
    LastRow = UBound(dat, 1)
    LastCol = UBound(dat, 2)
    
    NewI = 0
    For i = 1 To LastRow
        If dat(i, TestCol) > Threashold Then
            NewI = NewI + 1
            For j = 1 To LastCol
                NewDat(NewI, j) = dat(i, j)
            Next
        End If
    Next
    
    rng = NewDat
    
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic
    
    t2 = Timer()
    MsgBox "deleted in " & t2 - t1 & "s"
End Sub
chris neilsen
  • 52,446
  • 10
  • 84
  • 123
  • Tweaked the code for table header/etc and it worked like a charm. Super fast (205x faster than the autofilter method of discontinuous ranges). Curious to see how the sorted autofilter you mentioned in your comment stands up to this - guessing it'll be slower, but still much faster – newtovba Oct 19 '19 at 12:19
  • i like the way you set rng and lastcol, lastrow. – Patrick Lepelletier Oct 20 '19 at 07:26
1

I would use Range.Resize in VBA. Deletes 200.000 rows in a few seconds. See example below:

With wb3.Sheets("Changed Confirmations")
Set HistoryTable = .ListObjects("ChangedConfirmations")

With HistoryTable
    .ShowAutoFilter = True

    .Range.AutoFilter Field:=1, Criteria1:="<=" & EndDate, Operator:=xlOr, Criteria2:="="
    
    On Error Resume Next
    Application.DisplayAlerts = False
        .Range.Resize(.Range.Rows.Count - 1).Offset(1).SpecialCells(xlCellTypeVisible).Delete
    Application.DisplayAlerts = True
    On Error goto 0

    .ShowAutoFilter = False

    End With
End With
cigien
  • 57,834
  • 11
  • 73
  • 112
Espen
  • 11
  • 2
0

first off, with 100ks records you'd better switch to some database oriented software

sticking to Excel, if you don't mind reordering records, this is quite fast:

Option Explicit

Sub DeleteRows()

    Application.DisplayAlerts = False
    Application.Calculation = xlCalculationManual
    Application.ScreenUpdating = False

    Dim rng As Range

    With ActiveWorkbook.Sheets("Sheet1")
        Set rng = .Range("I2", .Cells(.Rows.Count, "I").End(xlUp))
        With .Sort
            .SortFields.Clear
            .SortFields.Add Key:=rng(1), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal

            .SetRange rng.CurrentRegion
            .Header = xlYes
            .MatchCase = False
            .Orientation = xlTopToBottom
            .SortMethod = xlPinYin
            .Apply
        End With

        With rng
            .AutoFilter Field:=1, Criteria1:="<=8"
            .Offset(1).Resize(.Rows.Count - 1).SpecialCells(xlCellTypeVisible).EntireRow.Delete
        End With
        .AutoFilterMode = False
    End With


    Application.DisplayAlerts = True
    Application.Calculation = xlCalculationAutomatic
    Application.ScreenUpdating = True
End Sub

if you mind records order it can be simply twicked to keep it

DisplayName
  • 13,283
  • 2
  • 11
  • 19
  • Ah yes - Order record does matter in this case - this is more of a one/two time thing that I'm trying to figure out on the side, so I'll definitely consider switching to something more database-y if/when I need a more permanent solution. – newtovba Oct 19 '19 at 12:20
0

Trying putting the spreadsheet in csv format into Notepad++ - from there you should be able to remove empty lines and other multiple entries very quickly and easily = you will have to download and install notepad++ but it completely free