2

I am running a script to find and delete rows that contain data from after 2018. I am searching through around 650000 rows. Every time I run the script after 5 seconds my cursor becomes the spinning circle and the excel program becomes unresponsive. Here is the code I am using.

Option Explicit
Option Base 1 'row and column index will match array index

Sub removeWrongYear()

Dim i As Long, yearA As Long, rowsCnt As Long
Dim rowsToDelete As Range
Dim vData As Variant

With ActiveSheet

    '1st to 635475 row, 20th column
    vData = Range(.Cells(1, 20), .Cells(635475, 20))

    For i = UBound(vData) To 2 Step -1
       If Val(Right(vData(i,1),2)) > 17 Then
        Debug.Print Val(Right(vData(i,1),2))
            rowsCnt = rowsCnt + 1

            If rowsCnt > 1 Then
                Set rowsToDelete = Union(rowsToDelete, .Rows(i))
            ElseIf rowsCnt = 1 Then
                Set rowsToDelete = .Rows(i)
            End If

        End If
    Next i

End With

If rowsCnt > 0 Then
    Application.ScreenUpdating = False
    rowsToDelete.EntireRow.Delete
    Application.ScreenUpdating = True
End If

End Sub
user9730643
  • 89
  • 1
  • 8
  • 8
    I think whatever you do it will be slow unless you do it all in arrays. Autofilter would be quicker than this method I think. – SJR May 17 '18 at 18:49
  • 1
    also you call the application.screenupdating = false after having already gone through the loop. better to call it first. – learnAsWeGo May 17 '18 at 18:52
  • If you don't have to worry about changes happening to the Excel sheet by users, you could add `DoEvents` right before the `Next i` line. It'll keep Excel from not responding; however, it will also allow some (possible very) delayed interaction with your workbook/worksheets. – Mistella May 17 '18 at 18:58
  • @Mistella except you do, because of that unqualified `Range` call in the `vData` assignment. – Mathieu Guindon May 17 '18 at 18:58
  • @MathieuGuindon I see it. Also, the `With ActiveSheet` could also be quite problematic in that scenario, as well. – Mistella May 17 '18 at 19:00
  • I *think* that reference would be held by the `With` block throughout the loop execution, regardless of what sheet gets activated in the body of the loop. – Mathieu Guindon May 17 '18 at 19:01
  • I don't know if it's a comparable situation, but if you step through code, the reference is held by the With (ignoring a different sheet being executed after that line). – SJR May 17 '18 at 19:04
  • In other news, `rowsCnt` isn't needed. Just verify whether `rowsToDelete Is Nothing` instead; if it's `Nothing`, then you're looking at the first row to delete. – Mathieu Guindon May 17 '18 at 19:12
  • You have two nice answers. Also remember that if you are deleting cells, and the cells you delete have formulas referencing them(indirectly. e.g VLOOKUP("val", yourRangeBeingPartiallyDeleted, 3, 0), then that will slow down as well, so think about setting formulas to manual, and back(if it was automatic). – MacroMarc May 17 '18 at 22:37

5 Answers5

5

Every time I run the script after 5 seconds my cursor becomes the spinning circle and the excel program becomes unresponsive.

That's normal. VBA is running on the single available UI thread, the same one Excel runs on. While it's busy running your loop, it's not able to respond to other stimuli, and tells you that by putting "(not responding)" in the title bar, until it completes the work and is able to resume doing everything else it needs to do (i.e. listen for mouse & keyboard messages, etc.).

You could add a little DoEvents in the body of that loop to allow Excel to breathe and process pending messages between iterations, but then there's a catch: first, your code will take even longer to complete, and second, if the user is able to select/activate another sheet in the middle of that loop, then this unqualified Range call:

vData = Range(.Cells(1, 20), .Cells(635475, 20))

...will be the source of a run-time error 1004, since you can't do Sheet1.Range(Sheet2.Cells(1,20), Sheet2.Cells(635475,20)) and expect Excel to know what to do with that (assuming Sheet2 was active when the loop started, and the user activated Sheet1 in the middle of it).

This answer provides what appears to be the most efficient approach to conditionally deleting lines when a lot of rows are involved. If you can, add a helper column to calculate your criteria (e.g. make it return TRUE for rows to keep and FALSE for rows to delete), then use Worksheet.Replace and Worksheet.SpecialCells to perform the filtering and deletion:

.Columns("Z:Z").Replace What:=False, _
                        Replacement:="", _
                        LookAt:=xlPart, _
                        SearchOrder:=xlByRows, _
                        MatchCase:=False, _
                        SearchFormat:=False, _
                        ReplaceFormat:=False
.Columns("Z:Z").SpecialCells(xlCellTypeBlanks).EntireRow.Delete

Then you don't need a loop, and it might actually complete before you get to count to 5 seconds.

Other than that, long-running operations are just that: long-running operations. Own it:

Application.StatusBar = "Please wait..."
Application.Cursor = xlWait
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False

'..code..

Application.Cursor = xlDefault
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Application.EnableEvents = True
Application.StatusBar = False
Mathieu Guindon
  • 69,817
  • 8
  • 107
  • 235
  • Wouldn't two arrays be quicker? – SJR May 17 '18 at 19:22
  • @SJR depends on how many rows are involved. At that scale, even arrays wouldn't cut it. Dave's `SpecialCells` approach ran in 3 seconds for the same 38K rows my array-based approach completed in 10 times that - then again, that might be just me having done something stupid... I have a post somewhere that compares various approaches, not sure if it was about deleting rows though, and dang I can't find it. – Mathieu Guindon May 17 '18 at 19:26
  • 1
    I hesitate to demur but I just tried on a range of 600k rows and 3 columns, transferring to a second array based on checking one column and it took 3-4 seconds. – SJR May 17 '18 at 19:30
  • 2
    @SJR post an answer! I've been hesitant to suggest the array solution since Dave's answer put mine to shame =) – Mathieu Guindon May 17 '18 at 19:34
  • I just post a two array approach that should outperform Dave's `SpecialCells ` approach. It should be noted, the more rows that you delete the faster the arrays will perform and the slower the `SpecialCells ` technique, –  May 17 '18 at 23:04
  • @Mathieu regarding the `Range(.Cells(...` code, this is actually safe. When both parameters to Range are qualified, it's not mandatory to also qualify the Range itself. There is a post somewhere on SO that deals with this. – chris neilsen May 18 '18 at 00:30
  • [Here it is](https://stackoverflow.com/questions/36368220/is-the-in-range-necessary-when-defined-by-cells/36369094#36369094) – chris neilsen May 18 '18 at 00:36
  • @Chris interesting. I actually repro'd the error in the immediate pane before posting that, so... I'd stick to the safe side and qualify everything. – Mathieu Guindon May 18 '18 at 00:36
  • @Mathieu when I answered that Q I recall doing quite a bit of testing to be sure of the impact of the code context. I wonder what context the immediate plane runs in? – chris neilsen May 18 '18 at 00:41
  • @chrisneilsen pretty sure it's the same as a standard module, i.e. unqualified range call is a member call on the `[_Global]` hidden module.. In a worksheet module unqualified Range is an implicit member call against `Me`. – Mathieu Guindon May 18 '18 at 00:50
  • @Mathieu just did some more tests, including in the immediate window. Result are consistent with the linked Q, ie `Range(.Cells ...` is ok as is. – chris neilsen May 18 '18 at 01:36
4

This seems pretty quick. It puts results in U1 and down so you'd probably want to amend that. This extracts the values you want into a second array.

Sub removeWrongYear()

Dim i As Long, vData As Variant, v2(), j As Long

vData = Range(Cells(1, 20), Cells(635475, 20))
ReDim v2(1 To UBound(vData, 1), 1 To 1)

For i = UBound(vData) To 2 Step -1
    If Val(Right(vData(i, 1), 2)) <= 17 Then
        j = j + 1
        v2(j, 1) = vData(i, 1)
    End If
Next i

Range("U1").Resize(j, 1) = v2

End Sub
SJR
  • 22,986
  • 6
  • 18
  • 26
2

This code process 635475 Rows x 20 Columns in 12.48 seconds on my fast computer and 33.32 seconds on my old computer (0.84 and 2.06 seconds for 38k x 20).

Option Explicit

Sub removeWrongYear2()
    Const DATE_COLUMN = 20
    Dim StartTime As Double: StartTime = Timer

    Dim data() As Variant, results() As Variant
    Dim c As Long, r As Long, r2 As Long
    With ActiveSheet
        data = .UsedRange.Value
        ReDim results(1 To UBound(data), 1 To UBound(data, 2))

        For r = 2 To UBound(data)
            If Val(Right(data(r, DATE_COLUMN), 2)) <= 17 Then
                r2 = r2 + 1
                For c = 1 To UBound(data, 2)
                    results(r2, c) = data(r, c)
                Next
            End If
        Next
        If r2 > 0 Then
            Application.ScreenUpdating = False
            Application.Calculation = xlCalculationManual
            .UsedRange.Offset(1).Value = results
            Application.Calculation = xlCalculationAutomatic
            Application.ScreenUpdating = True
        End If
    End With
    Debug.Print Round(Timer - StartTime, 2)
End Sub

Sub Setup()
    Dim data, r, c As Long
    Const LASTROW = 635475
    Cells.Clear
    data = Range(Cells(1, 1), Cells(LASTROW, 20)).Value

    For r = 1 To UBound(data)
        For c = 1 To 19
            data(r, c) = Int((LASTROW * Rnd) + 100)
        Next
        data(r, 20) = Int((10 * Rnd) + 10)
    Next
    Application.ScreenUpdating = False
    Range(Cells(1, 1), Cells(LASTROW, 20)).Value = data
    Application.ScreenUpdating = True
End Sub
2

This uses an AutoFilter - the more rows to delete, the faster it gets

Rows: 1,048,575 (Deleted: 524,286), Cols: 21   (70 Mb xlsb file)

Time: 6.90 sec, 7.49 sec, 7.21 sec   (3 tests)

Test data shown in images bellow


How it works

  • It generates a temporary helper column with formula "=RIGHT(T1, 2)" (first empty column)
  • Applies a filter for the years to keep ("<18") in the temp column
  • Copies all visible rows to a new sheet (not including the temp column)
  • Removes the initial sheet
  • Renames the new sheet to the initial sheet name

Option Explicit

Public Sub RemoveYearsAfter18()
    Dim ws As Worksheet, wsName As String, lr As Long, lc As Long
    Dim ur As Range, filterCol As Range, newWs As Worksheet

    Set ws = Sheet1     'Or: Set ws = ThisWorkbook.Worksheets("Sheet1")
    wsName = ws.Name

    lr = ws.Cells(ws.Rows.Count, "T").End(xlUp).Row         'Last Row in col T (or 635475)
    lc = ws.Cells(1, ws.Columns.Count).End(xlToLeft).Column 'Last Col in row 1

    Set ur = ws.Range(ws.Cells(1, 1), ws.Cells(lr, lc))
    Set filterCol = ws.Range(ws.Cells(1, lc + 1), ws.Cells(lr, lc + 1)) 'Exclude Headers

    OptimizeApp True
    Set newWs = ThisWorkbook.Worksheets.Add(After:=ws)  'Add new sheet
    With filterCol
        .Formula = "=RIGHT(T1, 2)"
        .Cells(1) = "FilterCol"                     'Column header
        .Value2 = .Value2                           'Convert formulas to values for filter
    End With
    filterCol.AutoFilter Field:=1, Criteria1:="<18" 'Reverse filter

    ur.Copy                                         'Copy visible data
    With newWs.Cells
        .PasteSpecial xlPasteColumnWidths
        .PasteSpecial xlPasteAll                    'Paste data on new sheet
        .Cells(1).Select
    End With

    ws.Delete                                       'Delete old sheet
    newWs.Name = wsName
    OptimizeApp False
End Sub

Private Sub OptimizeApp(ByVal speedUp As Boolean)
    Application.Calculation = IIf(speedUp, xlCalculationManual, xlCalculationAutomatic)
    Application.ScreenUpdating = Not speedUp
    Application.DisplayAlerts = Not speedUp
    Application.EnableEvents = Not speedUp
End Sub

Before

Before

After

After


paul bica
  • 10,557
  • 4
  • 23
  • 42
1

Sort() & AutoFilter() are always a good pair:

Sub nn()
    Dim sortRng As Range

    With ActiveSheet.UsedRange ' reference all data in active sheet
        With .Offset(, .Columns.Count).Resize(, 1) ' get a helper column right outside data
            .Formula = "=ROW()" ' fill it with sequential numbers from top to down
            .Value = .Value ' get rid of formulas
            Set sortRng = .Cells ' store the helper range
        End With

        With .Resize(, .Columns.Count + 1) ' consider data and the helper range
            .Sort key1:=.Cells(1, 20), order1:=xlAscending, Header:=xlNo ' sort it by data in column 20 
            .AutoFilter Field:=20, Criteria1:=">=01/01/2018" ' filter it for data greater than 2017
            .Resize(.Rows.Count - 1).Offset(1).SpecialCells(xlCellTypeVisible).EntireRow.Delete ' delete filtered data
            .Parent.AutoFilterMode = False ' remove filter
            .Sort key1:=sortRng(1, 1), order1:=xlAscending, Header:=xlNo ' sort things back by means of helper column
            .Columns(.Columns.Count).ClearContents ' clear helper column
        End With
    End With
End Sub

in my test a 768k row by 21 columns data took 11 seconds

DisplayName
  • 13,283
  • 2
  • 11
  • 19