2

I am using the below code to:
Delete the similar rows, keeping only one and combine cells values in the range "N", separated by vbLf
it works ,but with big range (e.g. 30 thousands rows) the macro takes a very long time to finish.
After debugging the code, I find out that using union causes macro to takes a very long time to finish.

Set rngDel = Union(rngDel, ws.Range("A" & i + m))

So with the below code , How to adapt a faster method to delete that range of rows other that using union?
In advance, grateful for any helpful comments and answers.

Sub DeleteSimilarRows_combine_Last_Column_N()
 
    Dim LastRow As Long, ws As Worksheet, arrWork, rngDel As Range, i As Long, j As Long, k As Long
     Dim strVal As String, m As Long
 
      Set ws = ActiveSheet: LastRow = ws.Cells(ws.Rows.Count, 1).End(xlUp).Row
    arrWork = ws.Range("A1:A" & LastRow).Value2 'Place the range in an array to make iteration faster
 
    Application.DisplayAlerts = False: Application.ScreenUpdating = False
    For i = 2 To UBound(arrWork) - 1                'Iterate between the array elements:
        If arrWork(i, 1) = arrWork(i + 1, 1) Then
            'Determine how many consecutive similar rows exist:______
            For k = 1 To LastRow
                If i + k + 1 >= UBound(arrWork) Then Exit For
                If arrWork(i, 1) <> arrWork(i + k + 1, 1) Then Exit For
            Next k '__
 
            For j = 14 To 14                  'Build the concatenated string of cells in range "N":
                strVal = ws.Cells(i, j).Value
                For m = 1 To k
                    strVal = strVal & vbLf & ws.Cells(i + m, j).Value
                Next m
                ws.Cells(i, j).Value = strVal: strVal = ""
           Next j
 
           For m = 1 To k                    'Place the cells for rows to be deleted in a Union range, to delete at the end, at once
                If rngDel Is Nothing Then
                     Set rngDel = ws.Range("A" & i + m)
                Else
                    Set rngDel = Union(rngDel, ws.Range("A" & i + m)) 'This line causes macro takes very long time to finish.
                End If
         Next m
         i = i + k: If i >= UBound(arrWork) - 1 Then Exit For 'Increment the i variable and exiting if the resulted value exits the array size
       End If
    Next i
 
    If Not rngDel Is Nothing Then rngDel.EntireRow.Delete    'Delete the not necessary rows
    Application.DisplayAlerts = True: Application.ScreenUpdating = True
End Sub
Leedo
  • 439
  • 5
  • 18
  • How do you know that the mentioned line causes the issue? Did you measure the time passed for this line? It would be helpful if you could explain what exactly the code is supposed to do. Crediting the author and supplying the link to the post where you obtained this code would also be helpful. A screenshot of the before and after could be useful. – VBasic2008 May 06 '22 at 12:37
  • 1
    Could you build up an array of just the values you want to retain instead? That way you can just delete everything and replace with what you want to keep. – Ryan Wildry May 06 '22 at 12:48
  • @VBasic2008 ,I know by commenting the union line (then the macro took 0.2 seconds) , and I already mention the purpose of code on the question. https://stackoverflow.com/questions/69105746/how-to-combine-or-merge-cells-with-the-same-values-vertically-and-horizontally – Leedo May 06 '22 at 13:15
  • 1
    Union gets progressively slower as you add more cells/areas to the range (see numbers here: https://stackoverflow.com/a/56573408/478884). If you were working "bottom up" you could delete `rngDel` every (eg) 500 rows, but you can't take that approach since you're working top-down. I will post a different idea which may work for you. – Tim Williams May 06 '22 at 16:22
  • @Tim Willams: Are you sure that the reason is `Union`? I'm not entirely sure, but I think I measured the time once and `Union` finished in a split second, but it got very slow to delete such a combined range from hundreds of cells (rows). BTW, I got even better results with just 50 cells (rows). Looking forward to see the different idea. – VBasic2008 May 06 '22 at 16:43
  • 1
    @VBasic2008 I've previously run an experiment on this. I found that the time taken for Union to add another non-contiguous range (ie one that will result in another Area) increases with the square of the number of Areas already in the range. – chris neilsen May 06 '22 at 18:11
  • 1
    @Leedo, another approach is to not actually _delete_ rows, but overwrite the data. [See this example](https://stackoverflow.com/a/58461754) – chris neilsen May 06 '22 at 18:17

2 Answers2

5

Union gets progressively slower as you add more cells/areas to the range (see numbers here: https://stackoverflow.com/a/56573408/478884). If you were working "bottom up" you could delete rngDel every (eg) 500 rows, but you can't take that approach since you're working top-down.

Here's a different approach - adding cells to a Collection and then processing the collection "bottom-up" at the end, using a batch-delete process.

Sub TestRowDeletion()

    Dim rngRows As Range, data, rngDel As Range, i As Long
    Dim t, nRows As Long, colCells As New Collection
    
    Set rngRows = Range("A1:A10000") '10k rows for testing
    
    'Approach #1 - your existing method
    DummyData rngRows     'populate some dummy data
    data = rngRows.Value
    t = Timer
    For i = 1 To UBound(data, 1)
        'removing ~25% of cells...
        If data(i, 1) > 0.75 Then BuildRange rngDel, rngRows.Cells(i)
    Next i
    If Not rngDel Is Nothing Then rngDel.EntireRow.Delete
    Debug.Print "Regular single delete", Timer - t

    'Approach #2 - batch-deleting rows
    DummyData rngRows 'reset data
    data = rngRows.Value
    t = Timer
    For i = 1 To UBound(data, 1)
        If data(i, 1) > 0.75 Then colCells.Add rngRows.Cells(i)
    Next i
    RemoveRows colCells
    Debug.Print "Batch-deleted", Timer - t

    'Approach #3 - array of "delete" flags plus SpecialCells()
    DummyData rngRows 'reset data
    data = rngRows.Value
    t = Timer
    ReDim flags(1 To UBound(data, 1), 1 To UBound(data, 2))
    For i = 1 To UBound(data, 1)
        If data(i, 1) > 0.75 Then
            flags(i, 1) = "x"
            bDelete = True 'flag we have rows to delete
        End If
    Next i
    If bDelete Then
        With rngRows.Offset(0, 10) 'use an empty column....
            .Value = flags  'populate with flags for deletion
            .SpecialCells(xlCellTypeConstants).EntireRow.Delete
        End With
    End If
    Debug.Print "Specialcells", Timer - t

End Sub

'Delete the row for any cell in `col`
'  cells were added to `col` in a "top down" order
Sub RemoveRows(col As Collection)
    Dim rngDel As Range, n As Long
    For n = col.Count To 1 Step -1 'working from the bottom up...
        BuildRange rngDel, col(n)
        If n Mod 250 = 0 Then
            rngDel.EntireRow.Delete
            Set rngDel = Nothing
        End If
    Next n
    If Not rngDel Is Nothing Then rngDel.EntireRow.Delete
End Sub

Sub DummyData(rng As Range)
    With rng
        .Formula = "=RAND()"
        .Value = .Value
    End With
End Sub

Sub BuildRange(ByRef rngTot As Range, rngAdd As Range)
    If rngTot Is Nothing Then
        Set rngTot = rngAdd
    Else
        Set rngTot = Application.Union(rngTot, rngAdd)
    End If
End Sub

Times (sec) - note how differently the single-delete and batch-delete approaches scale as more rows are added.

# of rows deleted         ~2.5k/10k    ~5k/20k     ~7.5k/30k 
------------------------------------------------------------
1. Regular single delete     10.01         65.9       226
2. Batch-deleted             2.2           4.7        7.8
3. SpecialCells              1.6           3.1        4.7

You could also consider populating a "delete" flag in your dataset, then using the autofilter/delete visible rows approach (EDIT: added as method #3)

Tim Williams
  • 154,628
  • 8
  • 97
  • 125
3

Posting this as a working (but faster) version of your actual use case, since my other answer is really just about timing the different approaches.

Sub DeleteSimilarRowsCombineColumnN()

    Const SEP As String = ","
    Dim arrKeys, arrVals, arrFlags, rngRows As Range, rngVals As Range, i As Long, key, currKey, s As String
    Dim ws As Worksheet, ub As Long, t, n As Long
    
    t = Timer
    Set ws = ActiveSheet
    Set ws = ActiveSheet
    Set rngRows = ws.Range("A1:A" & ws.Cells(Rows.Count, 1).End(xlUp).Row)
    Set rngVals = rngRows.EntireRow.Columns("N")
    
    arrKeys = rngRows.Value
    ub = UBound(arrKeys, 1)
    arrVals = rngVals.Value
    ReDim arrFlags(1 To UBound(arrKeys, 1), 1 To 1)
 
    currKey = Chr(0)     'non-existing key...
    For i = ub To 1 Step -1                      'looping from bottom up
        key = arrKeys(i, 1)                      'this row's key
        If key <> currKey Then                   'different key from row below?
            If i < ub Then arrVals(i + 1, 1) = s 'populate the collected info for any previous key
            s = arrVals(i, 1)                    'collect this row's "N" value
            currKey = key                        'set as current key
        Else
            If i < ub Then
                arrFlags(i + 1, 1) = "x" 'flag for deletion
                n = n + 1
            End If
            s = arrVals(i, 1) & SEP & s             'concatenate the "N" value
        End If
    Next i
    arrVals(1, 1) = s                              'populate the last (first) row...
    rngVals.Value = arrVals                        'drop the concatenated values
    
    If n > 0 Then    'any rows to delete?
        Debug.Print "About to delete " & n & " of " & ub & " rows", Timer - t
        With rngRows.Offset(0, 100) 'use any empty column
            .Value = arrFlags
            .SpecialCells(xlCellTypeConstants).EntireRow.Delete
        End With
        Debug.Print "Done deleting in " & Round(Timer - t, 2) & " sec"
    End If
End Sub
Tim Williams
  • 154,628
  • 8
  • 97
  • 125
  • on dataset of 30k rows, your code completed on 14 seconds (it was consider very fast for me) but yesterday I debugged it deeper to find if it can be more faster. I found the bottle neck on this line `.SpecialCells(xlCellTypeConstants).EntireRow.Delete` (it takes 13.5 seconds) from the overall code time ( 14 seconds). – Leedo Apr 18 '23 at 07:33
  • Of the 30k rows, how many rows were deleted ? Deleting rows takes time, so you may find that clearing the rows is faster. – Tim Williams Apr 18 '23 at 14:49
  • The rows to be deleted are 21k, I measured `ClearContents` only and it toke 1.75 sec. I even tried another approach by sort the values which to be deleted and then set this rows to a range and then delete that range , I measured ( `Sorting values + Deletion of that range`) and it toke **0.12 sec** (significantly faster) , anyhow thanks for all your efforts. – Leedo Apr 19 '23 at 07:30