2

I have the following VBA code within excel. It's goal is to remove a row if the given text is found, as well as remove the row directly below it. It needs to scan roughly 700k rows and is taking roughly an hour to do 100k rows. Does anyone see any optimization?

Sub RemovePageHeaders()
    Application.ScreenUpdating = False
    Dim objRange As Range
    Set objRange = Cells.Find("HeaderText")
    While objRange <> ""
        objRange.Offset(1, 0).Rows(1).EntireRow.Delete
        objRange.Rows(1).EntireRow.Delete
        Set objRange = Cells.Find("HeaderText")
    Wend
    MsgBox ("I'm done removing page headers!")
End Sub

Thanks in advance!

Nick Heidke
  • 2,787
  • 2
  • 34
  • 58

7 Answers7

2

I know this is late, but if I understand your problem, then you are deleting rows based on a "HeaderText" in column C. So, since i didn't look at your data, i created my own. I created 700,000 rows and every 9th row contained the "HeaderText" string. It deleted ~233k rows ("HeaderText" row + row before + row after) and ran in 2.2 seconds on my computer. Give it a try!!

Private Declare Function timeGetTime Lib "winmm.dll" () As Long
Sub DeleteHeaders()
Dim LastRow As Long
Dim I As Long
Dim WkSheet As Excel.Worksheet
Dim VArray As Variant
Dim NewArray() As String
Dim BooleanArray() As Boolean
Dim NewArrayCount As Long
Dim J As Long
Dim T As Double
Dim DeleteRowCount As Long

T = timeGetTime

With Application
    .ScreenUpdating = False
    .EnableEvents = False
    .Calculation = xlCalculationManual
End With

Set WkSheet = ThisWorkbook.Sheets("Sheet1")

With WkSheet.UsedRange
    LastRow = .Rows.Count
    VArray = .Value
End With
ReDim BooleanArray(0 To UBound(VArray, 1) - 1), NewArray(UBound(VArray, 1) - 1, 0 To UBound(VArray, 2))

For I = 1 To UBound(VArray, 1)
    If InStrB(1, VArray(I, 3), "HeaderText", vbBinaryCompare) <> 0 Then
        BooleanArray(I - 1) = Not BooleanArray(I - 1)
        BooleanArray(I) = Not BooleanArray(I)
        BooleanArray(I + 1) = Not BooleanArray(I + 1)
    End If
Next I

For I = LBound(BooleanArray, 1) To UBound(BooleanArray, 1)
    If BooleanArray(I) = False Then
        For J = LBound(VArray, 2) To UBound(VArray, 2)
            NewArray(NewArrayCount, J - 1) = VArray(I + 1, J)
        Next J
        NewArrayCount = NewArrayCount + 1
    Else
        DeleteRowCount = DeleteRowCount + 1
    End If
Next I

With WkSheet
    .Cells.Delete
    .Range("a1:c" & NewArrayCount).Value = NewArray
End With

With Application
    .ScreenUpdating = True
    .EnableEvents = True
    .Calculation = xlCalculationAutomatic
End With

Erase NewArray, BooleanArray, VArray

MsgBox "Deleted " & DeleteRowCount & " rows." & vbNewLine & vbNewLine & _
"Run time: " & Round((timeGetTime - T) / 1000, 3) & " seconds.", vbOKOnly, "RunTime"

End Sub
JoeB
  • 21
  • 1
2

Try the following sub. It loops from the bottomm-most row to the top, checking column 3 for "HeaderText". If that's found, it delete the row and the one below it. On a C2D E8500 with 2 gigs of RAM it takes just over a minute per 100,000 rows on a sheet with 1 million rows.

Sub RemoveHeaders()
    Dim i As Long

    Application.ScreenUpdating = False
    Debug.Print "Started: " & Now
    For i = ActiveSheet.UsedRange.Rows.Count To 1 Step -1
        If ActiveSheet.Cells(i, 3) = "HeaderText" Then
            ActiveSheet.Range(i & ":" & i + 1).EntireRow.Delete
        End If
    Next i
    Application.ScreenUpdating = True
    Debug.Print "Finished: " & Now
End Sub

EDIT For a slightly ghetto but possibly much faster solution try this:

  1. Change the constant in the below code to the number of the first column that's blank in every row. For example if your data takes up columns A-F, you want the constant to be 7 (column G).
  2. Run the code, it will put the row number next to every entry. Should take around 30 seconds.
  3. Sort the ENTIRE data by column C; this should take less than a minute.
  4. Find "HeaderText" visually, select and delete all the rows.
  5. Sort by your row-numbered column ("G" in my example).
  6. Delete the row-numbered column (again, "G" in my example).

    Sub NumberColumns()
        Const BLANK_COLUMN = 7
        Dim i As Long
    
        For i = ActiveSheet.UsedRange.Rows.Count To 1 Step -1
            ActiveSheet.Cells(i, BLANK_COLUMN) = i
        Next i
        Debug.Print "done"
    

    End Sub

Banjoe
  • 1,768
  • 12
  • 13
  • nice one and +1. this was probably the `Find` statement that was slow – JMax Aug 31 '11 at 19:25
  • i was thinking about a filter to delete the rows but OP also wants to delete the next row. i'm afraid your edit solution won't work (or maybe i missed something) – JMax Aug 31 '11 at 19:45
  • Wow, duh, I forgot that requirement. – Banjoe Aug 31 '11 at 20:19
  • This took about 25 minutes on my machine (I used the original solution mentioned above). Worked flawlessly, thanks for the time! – Nick Heidke Aug 31 '11 at 20:39
  • I would highly suggest you not looping through each cell as each call to Excel is a hit in perfomance. You can use a varray (like my answer) to do it in a fraction of the time. – Gaijinhunter Sep 01 '11 at 01:52
  • Have you tried your method on a million row sheet? I pasted your code into my sample sheet and it's been hung at 50% CPU utilization for 30 minutes so far. For reference I have "HeaderText" every 5 rows, your method may perform much better with a smaller set to delete. – Banjoe Sep 01 '11 at 01:56
2

Even if it doesn't fully answer the question, it may help any reader so...

There are several tips on the web about optimizing vba. In particular, you can do:

'turn off some Excel functionality so your code runs faster
'these two are especially very efficient
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
'use these if you really need to
Application.DisplayStatusBar = False
Application.EnableEvents = False   'is very efficient if you have ANY event associated with what your macro is going to do

'code goes here

'at the end, don't forget to restore the default behavior
'calculate the formulas
Application.Calculate
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Application.DisplayStatusBar = True
Application.EnableEvents = True

See here for more information

JMax
  • 26,109
  • 12
  • 69
  • 88
  • Application.ScreenUpdating = False is the most important. – AMissico Aug 31 '11 at 20:12
  • Also see [this answer](http://stackoverflow.com/questions/118863/when-to-use-a-class-in-vba/143395#143395) to do it with RAII, so that the correct environment is restored in case an error occurs. – Alexandre C. Aug 31 '11 at 21:31
2

Putting this entry in a little late. It should be about 2X faster than the accepted solution. I used my XP Excel 2003 computer with 1 gig to figure it out.

Sub DeleteHeaderText()

    Dim bUnion As Boolean
    Dim d1 As Double
    Dim l As Long
    Dim rDelete As Range
    Dim wks As Worksheet
    Dim vData As Variant

    d1 = Timer
    Application.EnableEvents = False
    Application.ScreenUpdating = False

    bUnion = False
    Set wks = ActiveSheet
    lEnd = ActiveSheet.UsedRange.Rows.Count

    vData = wks.Range("C1:C" & lEnd).Value2

    For l = 1 To lEnd
        If vData(l, 1) = "HeaderText" Then
            If bUnion Then
                Set rDelete = Union(rDelete, wks.Range("A" & l, "A" & l + 1))
            Else
                Set rDelete = wks.Range("A" & l, "A" & l + 1)
                bUnion = True
            End If
            l = l + 1
        End If
    Next l

    Debug.Print Timer() - d1

    rDelete.EntireRow.Delete

    Debug.Print Timer() - d1

End Sub
Jon49
  • 4,444
  • 4
  • 36
  • 73
1

Here's a solution that will run on 100k rows in about 5-20 seconds depending on how many occurances of 'HeaderText' you have. As you requested, it will delete both the row with HeaderText in the C column as well as the row directly above it.

Update: As it's been pointed out, this works on smaller data sets up to about 100k, but on larger sets it's really doesn't. Back to the drawing board :)

 Sub DeleteHeaders()

Application.ScreenUpdating = False
Dim lastRow As Long
Dim varray As Variant

lastRow = Range("C" & Rows.Count).End(xlUp).Row

On Error Resume Next
varray = Range("C1:C" & lastRow).Value
For i = UBound(varray, 1) To 1 Step -1
    If varray(i, 1) = "HeaderText" Then
        Range("C" & i - 1, Range("C" & i)).EntireRow.Delete
        i = i - 1
    End If
Next

Application.ScreenUpdating = True
End Sub

How it works: By dumping the entire C column into a variant array and working from it within excel, you get major speed increase. The varray is laid out like (1, 1), (2, 1), (3, 1) with the first number being the row number, so all you have to do is loop through it backwards. The key is making sure to delete both rows at the same time and decrementing i by one more.

Gaijinhunter
  • 14,587
  • 4
  • 51
  • 57
  • I did a comparison test to see if deleting all of them at once (using the union to combine the ranges first) and found that do that is twice as fast than deleting one at a time. That's for excel 2003, of course. I did the just over 65000 rows with the header text every 100. It took 65 seconds to run your code and 33 seconds to run mine. If you do a comparison test let me know if it is similar to my results or not. Also, when I ran your code, I changed it to delete the one below "HeaderText". Not that that matters. – Jon49 Sep 01 '11 at 14:58
  • One last thought on the subject. Using .Value2 is faster then .Value (the default). Value2 just grabs everything as doubles/strings. .Value checks to see if it is a different type, like is it a double, or string, or date, etc. See http://fastexcel.wordpress.com/2011/05/25/writing-efficient-vba-udfs-part-1/ – Jon49 Sep 01 '11 at 17:16
  • The difference on modern machines is minimal and it limits the flexibility of the code so I always choose to use .value :) – Gaijinhunter Sep 01 '11 at 17:38
  • Yes, you're right. I guess it would be more effective for UDFs but for macros it's not a big deal unless you have to read thousands of large ranges at different times. – Jon49 Sep 01 '11 at 19:51
0

The following is code lifted from a Bill Jelen book that is fantastic for this purpose.

Use a column (column A for my code) with some logic to determine if a row should be hidden on not.

Use the following formula in all applicable cells in that column

=IF(test TRUE to hide, 1, "keep")

Now use the VBA below

Range("A1:A10000").SpecialCells(xlCellTypeFormulas, xlNumbers).EntireRow.Delete

This selects all rows with a number returned by the formula at once, which is exactly the rows you want to delete. No looping required!

RonnieDickson
  • 1,400
  • 10
  • 9
0

Here on my blog have a scripts for this:

Sample One:

Sub DelBlankRows()
    Range("D1:D" & Cells _ 
    (Rows.Count,2).End(xlUp).Row).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
End Sub

Sample two:

Sub DeleteRowsWithSpecifiedData()
    'Looks in Column D and requires Column IV to be clean
    Columns(4).EntireColumn.Insert

    With Range("D1:D" & ActiveSheet.UsedRange.Rows.Count)
        .FormulaR1C1 = "=IF(RC[1]="""","""",IF(RC[1]=""Not Needed"",NA()))"
        .Value = .Value

        On Error Resume Next

        .SpecialCells(xlCellTypeConstants, xlErrors).EntireRow.Delete
    End With

    On Error GoTo 0

    Columns(4).EntireColumn.Delete
End Sub
ChrisF
  • 134,786
  • 31
  • 255
  • 325
Andre Bernardes
  • 101
  • 1
  • 3