0

I have code that would compare each cell in column A to everything in column B and do this for the number of lines specified.

This was fine when I had a couple hundred lines, but now I am finding with 2000 lines the code is just not going to cut it. Can anyone look at my code and tell me if there are some improvements to be made or if I should scrap it and do it differently.

Sub highlight()

Dim compare As String
Dim i As Integer
Dim comprange As Range
Dim lines As Integer
i = 2
ScreenUpdating = False

Range("a2").Select
lines = Application.InputBox(Prompt:="How many lines need to be compared?", 
_
Title:="SPECIFY RANGE", Type:=1)

Do Until IsEmpty(ActiveCell)

    If i + 1 > lines Then
        Exit Do
    End If

Set comprange = Range("A" & i)
    comprange.Select
    compare = comprange.Value
    i = i + 1

    Range("B2").Select
        Do Until IsEmpty(ActiveCell.Offset(1, 0))

            If ActiveCell.Value = compare Then
                ActiveCell.Interior.ColorIndex = 37
                ActiveCell.Offset(1, 0).Select
                Exit Do
            Else
                If IsEmpty(ActiveCell.Offset(1, 0)) Then
                    Exit Do
                Else
                ActiveCell.Offset(1, 0).Select
                End If
            End If
        Loop
    Loop
    compare = ActiveCell.Value
    Set comprange = Selection
    Range("a2").Select
    Do Until IsEmpty(ActiveCell.Offset(1, 0))

            If ActiveCell.Value = compare Then
                comprange.Interior.ColorIndex = 37
                ActiveCell.Offset(1, 0).Select
                Exit Do
            Else
                If IsEmpty(ActiveCell.Offset(1, 0)) Then
                    Exit Do
                Else
                ActiveCell.Offset(1, 0).Select
                End If
            End If
        Loop
End Sub
BruceWayne
  • 22,923
  • 15
  • 65
  • 110
Jaymes
  • 37
  • 5
  • Biggest thing is to [avoid using `.Select`/`.Activate`](https://stackoverflow.com/questions/10714251/). Are you simply checking if say `A1=B1`,`A2=B2`, etc? Have you looked in to [Conditional Formatting](https://support.office.com/en-us/article/use-formulas-with-conditional-formatting-fed60dfa-1d3f-4e13-9ecb-f1951ff89d7f)? – BruceWayne Jan 22 '18 at 21:46
  • Make sure the script changes calculations to manual and turns application events off while it runs. – Jerry Jeremiah Jan 22 '18 at 21:51
  • You should do it differently. Using a 'brute force' check takes upwards of a minute on my PC when both lists contain 2000 lines. Using a dictionary takes less than a second. And that time difference is going to grow exponentially with the length of your list. – jeffreyweir Jan 23 '18 at 17:52

2 Answers2

0

Try this, it will check ALL your values in column A and if it matches in column B hightlights.

Sub ok()
    Dim i, i2 As Long
    Dim LastRow, LastRow2 As Long

    With ActiveSheet
        LastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
    End With
    With ActiveSheet
        LastRow2 = .Cells(.Rows.Count, "B").End(xlUp).Row
    End With

    For i = 1 To LastRow
        For i2 = 1 To LastRow2
            If Range("A" & i).Value = Range("B" & i2).Value Then
                Range("A" & i).Interior.ColorIndex = 37
                Range("B" & i2).Interior.ColorIndex = 37
            End If
        Next
    Next
End Sub
pokemon_Man
  • 902
  • 1
  • 9
  • 25
  • You should bail on that inner loop once a match has been found. This will radically speed things up. For further enhancements, turn off screen updating and calculation, and pull the data into VBA via a Variant Array rather than looping through the cells in the worksheet. – jeffreyweir Jan 22 '18 at 22:59
  • OP want to check EVERY cell in column B so OP should not bail out once a match is found. Yes code can always be modified as needed but this should do what he need. – pokemon_Man Jan 23 '18 at 13:53
  • Sorry, wasn't clear what I meant by 'bail'. Let's say you've got the word "Jeff" in both list 1 and list 2, and that it appears multiple times in both lists, including at the start of each list. So your very first lookup of 2000 lookups finds a match. At that point, you don't need to perform any more lookups if the item is "Jeff"...rather you just scan the 4000 cells and highlight any instance of "Jeff", and then skip any further lookups on items that are "Jeff". And do likewise for every other item. Depending on how many duplicates there are between lists, this will speed things up greatly. – jeffreyweir Jan 23 '18 at 17:41
  • Doing a 'brute force' linear lookup on 2000 items against 2000 items means Excel must perform 2000*2000 comparisons i.e. 4,000,000. And the number of lookups increases exponentially with the number of rows to look in. If a large number of items are duplicated across both lists, many of these lookups can be avoided. – jeffreyweir Jan 23 '18 at 17:43
0

Probably the most efficient way to do this is to use the VBA Dictionary object. There's a great article at https://www.experts-exchange.com/articles/3391/Using-the-Dictionary-Class-in-VBA.html that covers a lot of what you need to know.

Below is a function called DuplicatesBetweenLists that will highlight duplicates between any number of different ranges. When calling it, you can specify:

  • A range to dump a list of duplicates into (pass in an empty range if you don't want a list generated)
  • Whether or not you want the duplicate items highlighted
  • A ParamArray (Comma-separated list) of all the ranges you want to check.

So if you wanted to check all three of columns in the image below for entries that occur in each column, and wanted to output a list to cell E1 of any duplicates as well as highlight them in the data, you'd call the function like this:

Sub test()

    Dim rOutput As Range

    Set rOutput = Range("E1")
    DuplicatesBetweenLists rOutput, True, Range("A2:A11"), Range("B2:B11"), Range("C2:C11")

End Sub

...which would give you something like this:

enter image description here

But if you only wanted highlighting and didn't want the identified duplicates output to a range, you'd simply comment out the Set rOutput = Range("E1") line, and pass in an empty range as the first argument.

It is lightning fast compared to a brute force iteration approach: it handled 2 lists containing 2000 items in less than a second (vs 1 minute for the brute force approach). And it handles 2 lists of 200,000 items in just 12 seconds.

And here's the function itself, as well as another function it calls:

Function DuplicatesBetweenLists(rOutput As Range, bHighlight As Boolean, ParamArray Ranges() As Variant)

    Dim vRange      As Variant
    Dim vInput      As Variant
    Dim dic_A       As Object
    Dim dic_B       As Object
    Dim dic_Output  As Object
    Dim lOutput     As Long
    Dim lRange      As Long
    Dim cell        As Range
    Dim TimeTaken As Date

    TimeTaken = Now()

    Set dic_A = CreateObject("Scripting.Dictionary")
    Set dic_B = CreateObject("Scripting.Dictionary")
    Set dic_Output = CreateObject("Scripting.Dictionary")
    Set dic_Range = CreateObject("Scripting.Dictionary")

    lRange = 1

    For Each vRange In Ranges
         vInput = vRange
        DuplicatesBetweenLists_AddToDictionary vInput, lRange, dic_A, dic_B
    Next vRange

    If lRange Mod 2 = 1 Then
        Set dic_Output = dic_B
    Else: Set dic_Output = dic_A
    End If

    'Write any duplicate items back to the worksheet
    If Not rOutput Is Nothing Then
        If dic_Output.Count > 0 Then
            If dic_Output.Count < 65537 Then
                rOutput.Resize(dic_Output.Count) = Application.Transpose(dic_Output.Items)
            Else
                'The dictionary is too big to transfer to the workheet
                'because Application.Transfer can't handle more than 65536 items.
                'So well transfer it to an appropriately oriented variant array,
                ' then transfer that array to the worksheet WITHOUT application.transpose
                ReDim varOutput(1 To dic_Output.Count, 1 To 1)
                For Each vItem In dic_Output
                    lOutput = lOutput + 1
                    varOutput(lOutput, 1) = vItem
                Next vItem
                rOutput.Resize(dic_Output.Count) = varOutput
            End If
        End If
    End If

    'Highlight any duplicates
    If bHighlight Then
        'Highlight cells in the range that qualify
        Application.ScreenUpdating = False
        For Each vRange In Ranges
            'Set rInput = vRange
            vRange.Interior.ColorIndex = 0
            For Each cell In vRange
                 With cell
                    If dic_Output.Exists(.Value2) Then .Interior.Color = 65535
                End With
            Next cell
        Next vRange
        Application.ScreenUpdating = True
        TimeTaken = TimeTaken - Now()
        Debug.Print Format(TimeTaken, "HH:MM:SS") & "(HH:MM:SS)"
    End If


'Cleanup
Set dic_A = Nothing
Set dic_B = Nothing
Set dic_Output = Nothing



End Function





Private Function DuplicatesBetweenLists_AddToDictionary(varItems As Variant, ByRef lngRange As Long, ByVal dic_A As Object, ByVal dic_B As Object)
Dim lng As Long
Dim dic_dedup As Object
Dim varItem As Variant
Dim lPass As Long
Set dic_dedup = CreateObject("Scripting.Dictionary")

For lPass = 1 To UBound(varItems, 2)
    If lngRange = 1 Then
        'First Pass: Just add the items to dic_A
        For lng = 1 To UBound(varItems)
            If Not dic_A.Exists(varItems(lng, 1)) Then dic_A.Add varItems(lng, 1), varItems(lng, 1)
        Next
    Else:
    ' Add items from current pass to dic_Dedup so we can get rid of any duplicates within the column.
    ' Without this step, the code further below would think that intra-column duplicates were in fact
    ' duplicates ACROSS the columns processed to date

    For lng = 1 To UBound(varItems)
        If Not dic_dedup.Exists(varItems(lng, lPass)) Then dic_dedup.Add varItems(lng, lPass), varItems(lng, lPass)
    Next

    'Find out which Dictionary currently contains our identified duplicate.
    ' This changes with each pass.
    '   * On the first pass, we add the first list to dic_A
    '   * On the 2nd pass, we attempt to add each new item to dic_A.
    '       If an item already exists in dic_A then we know it's a duplicate
    '       between lists, and so we add it to dic_B.
    '       When we've processed that list, we clear dic_A
    '   * On the 3rd pass, we attempt to add each new item to dic_B,
    '       to see if it matches any of the duplicates already identified.
    '       If an item already exists in dic_B then we know it's a duplicate
    '       across all the lists we've processed to date, and so we add it to dic_A.
    '       When we've processed that list, we clear dic_B
    '   * We keep on doing this until the user presses CANCEL.

    If lngRange Mod 2 = 0 Then
        'dic_A currently contains any duplicate items we've found in our passes to date
        'Test if item appears in dic_A, and IF SO then add it to dic_B
        For Each varItem In dic_dedup
            If dic_A.Exists(varItem) Then
                If Not dic_B.Exists(varItem) Then dic_B.Add varItem, varItem
            End If
        Next
        dic_A.RemoveAll
        dic_dedup.RemoveAll

    Else 'dic_B currently contains any duplicate items we've found in our passes to date

        'Test if item appear in dic_B, and IF SO then add it to dic_A
        For Each varItem In dic_dedup
            If dic_B.Exists(varItem) Then
                If Not dic_A.Exists(varItem) Then dic_A.Add varItem, varItem
            End If
        Next
        dic_B.RemoveAll
        dic_dedup.RemoveAll
        End If
    End If
    lngRange = lngRange + 1
Next

End Function
jeffreyweir
  • 4,668
  • 1
  • 16
  • 27