2

I have a data set in my excel sheet, The data in each cell is a set of numbers separated by ";".

Below is my data set

The expected result is in 2nd cell all the four numbers are there in both column (G and H), but not in same order. In the next row, the order is same. So is there any way to check the similarity

I have tried using below code but it only seem to highlight (red) first few characters

This is the output of my code

If anyone wants to see the file then click here

The code is :

Sub highlight()
    Dim xRg1 As Range
    Dim xRg2 As Range
    Dim xTxt As String
    Dim xCell1 As Range
    Dim xCell2 As Range
    Dim I As Long
    Dim J As Integer
    Dim xLen As Integer
    Dim xDiffs As Boolean
    On Error Resume Next
    If ActiveWindow.RangeSelection.Count > 1 Then
      xTxt = ActiveWindow.RangeSelection.AddressLocal
    Else
      xTxt = ActiveSheet.UsedRange.AddressLocal
    End If
lOne:
    Set xRg1 = Application.InputBox("Range A:", "Similarity finder", xTxt, , , , , 8)
    If xRg1 Is Nothing Then Exit Sub
    If xRg1.Columns.Count > 1 Or xRg1.Areas.Count > 1 Then
        MsgBox "Multiple ranges or columns have been selected ", vbInformation, "Similarity finder"
        GoTo lOne
    End If
lTwo:
    Set xRg2 = Application.InputBox("Range B:", "Similarity finder", "", , , , , 8)
    If xRg2 Is Nothing Then Exit Sub
    If xRg2.Columns.Count > 1 Or xRg2.Areas.Count > 1 Then
        MsgBox "Multiple ranges or columns have been selected ", vbInformation, "Similarity finder"
        GoTo lTwo
    End If
    If xRg1.CountLarge <> xRg2.CountLarge Then
       MsgBox "Two selected ranges must have the same numbers of cells ", vbInformation, "Similarity finder"
       GoTo lTwo
    End If
    xDiffs = (MsgBox("Click Yes to highlight similarities, click No to highlight differences ", vbYesNo + vbQuestion, "Similarity finder") = vbNo)
    Application.ScreenUpdating = False
    xRg2.Font.ColorIndex = xlAutomatic
    For I = 1 To xRg1.Count
        Set xCell1 = xRg1.Cells(I)
        Set xCell2 = xRg2.Cells(I)
        If xCell1.Value2 = xCell2.Value2 Then
            If Not xDiffs Then xCell2.Font.Color = vbRed
        Else
            xLen = Len(xCell1.Value2)
            For J = 1 To xLen
                If Not xCell1.Characters(J, 1).Text = xCell2.Characters(J, 1).Text Then Exit For
            Next J
            If Not xDiffs Then
                If J <= Len(xCell2.Value2) And J > 1 Then
                    xCell2.Characters(1, J - 1).Font.Color = vbRed
                End If
            Else
                If J <= Len(xCell2.Value2) Then
                    xCell2.Characters(J, Len(xCell2.Value2) - J + 1).Font.Color = vbRed
                End If
            End If
        End If
    Next
    Application.ScreenUpdating = True
End Sub
Pavan Alur
  • 45
  • 6
  • If you want to compare the numbers in each cell then the best approach would be to use `Split()` to get an array of numbers and then loop the arrays to match the values. – Tim Williams Jan 05 '21 at 23:49
  • @TimWilliams Thanks for the reply, but I feel using split and looping will slow down the process because at a time there could be hundreds of rows – Pavan Alur Jan 06 '21 at 00:19
  • 1
    FYI going digit-by-digit using the `Characters` collection will be slow, and as you're finding out is not going to get you where you want to be. – Tim Williams Jan 06 '21 at 00:24
  • Could you post the numbers as text (not as an image). I don't think that anyone feels like typing all those numbers. – VBasic2008 Jan 06 '21 at 01:01
  • @VBasic2008 I have added a link to the excel file in the post – Pavan Alur Jan 06 '21 at 01:16
  • @TimWilliams at least if it even highlights the part where the numbers match that will work for me – Pavan Alur Jan 06 '21 at 01:17
  • Is it that you want to highlight the numbers in H2 that are also found in G2? and the same for each row? – Ron Rosenfeld Jan 06 '21 at 01:27
  • @RonRosenfeld correct, the code that I have highlights only some part (obviously broken). I want it to highlight the set of numbers (separated by ;) to highlight. I am experimenting with split function, will update if anything works, Thanks for the help :) – Pavan Alur Jan 06 '21 at 01:32
  • Please define "similarity". It might help to write a function which, when given two strings (or numbers?), returns some measure of similarity. Or, perhaps just returns `True` or `False` if you are defining similarity in a binary way (either similar or not). – John Coleman Jan 06 '21 at 01:32
  • @john by similarity I mean that within the cell if those numbers (separated by ;) exist, irrespective of the order, even if it display true /false or even highlights the numbers that match works for me. Thanks for asking :) – Pavan Alur Jan 06 '21 at 01:46
  • 1
    Why the `On Error Resume Next`? When used like this (without a good reason and without a subsequent `On Error GoTo 0`) it functions as a band-aid on bad code that simply hides the source of bugs. – John Coleman Jan 06 '21 at 01:46
  • @john I used it because there will be very little scope for error in data type or format. This function is just a part of the larger process that I'm trying to achieve. – Pavan Alur Jan 06 '21 at 01:49
  • 1
    @PavanAlur The fact remains that `On Error Resume Next` is a bad habit when used as `On Error Hide Bug`, which is what you seem to be doing here. It is the programming equivalent to taking batteries out of a smoke detector and hoping for the best. – John Coleman Jan 06 '21 at 16:10

3 Answers3

1

One way to do this is to use the first column to create a regular expression and apply it against the second column.

One advantage of using regex is that one of the data returned is the start and length of the match -- perfect for addressing the characters property of the range object.

I used early binding (see the reference to be set in the code notes), but you could use late binding if you must.

I also have the data in columns A & B, but you can alter that with the part of the code that defines the data location.

You should NOT need to use any On Error code. Much better to write the code to handle any forseeable errors. I did NOT do any error checking, and that may need to be added.

If speed is an issue, there are various other modifications which can be made.

The constructed regular expression will have the general appearance of

\b(?:nnn|nnn|nnn|nnn)\b

which means to

  • match a word boundary
  • match any of the pipe delimited substrings
  • match another word boundary.

For more information, see How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

Option Explicit
'Set reference to Microsoft VBScript Regular Expressions 5.5
Sub highLight()
    Dim R As Range, C  As Range, WS As Worksheet
    Dim RE As RegExp, MC As MatchCollection, M As Match
    Dim sSplit As String
    
'set the data range
'   one column wide
'   column 2 will be offset 1 to the left
'Obviously you can change this in many ways
'And also work in your user selected method as in your code.
'only requirement is that the ranges be single column, and you can
'easily check for that
Set WS = ThisWorkbook.Worksheets("Sheet4")
With WS
    Set R = Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp))
End With

Set RE = New RegExp
Application.ScreenUpdating = False
With RE
    .Global = True
    
    'loop through the first column
        For Each C In R.Rows
            'replace the semicolon with the pipe
            sSplit = Replace(Join(Split(C.Value, ";"), "|"), " ", "")
                
                'since data has a terminal semi-colon, need to remove it if present
                If Right(sSplit, 1) = "|" Then sSplit = Left(sSplit, Len(sSplit) - 1)
            
            'finish construction of the regex pattern
            .Pattern = "\b(?:" & sSplit & ")\b"
            
            'check for matches and change relevant characters font color
            Set MC = .Execute(C.Offset(columnoffset:=1))
            With C.Offset(-0, 1)
                .Font.Color = vbBlack
                For Each M In MC
                    .Characters(M.FirstIndex + 1, M.Length).Font.Color = vbRed
                Next M
            End With
        Next C
End With
End Sub

enter image description here

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
1

Similarities

  • There is a problem with your data: the values in column A end with an "; ", while the values in column B end with ";". The solution presented is to use only the string before the last semi-colon ; with the Split function.
  • To avoid complications, I integrated the code into your rather cool solution.

The Code

Option Explicit

Sub highlightOrig()
    Dim xRg1 As Range
    Dim xRg2 As Range
    Dim xTxt As String
    Dim xCell1 As Range
    Dim xCell2 As Range
    Dim I As Long
    Dim J As Integer
    Dim xLen As Integer
    Dim xDiffs As Boolean
    On Error Resume Next
    If ActiveWindow.RangeSelection.Count > 1 Then
      xTxt = ActiveWindow.RangeSelection.AddressLocal
    Else
      xTxt = ActiveSheet.UsedRange.AddressLocal
    End If
lOne:
    Set xRg1 = Application.InputBox("Range A:", "Similarity finder", xTxt, , , , , 8)
    If xRg1 Is Nothing Then Exit Sub
    If xRg1.Columns.Count > 1 Or xRg1.Areas.Count > 1 Then
        MsgBox "Multiple ranges or columns have been selected ", vbInformation, "Similarity finder"
        GoTo lOne
    End If
lTwo:
    Set xRg2 = Application.InputBox("Range B:", "Similarity finder", "", , , , , 8)
    If xRg2 Is Nothing Then Exit Sub
    If xRg2.Columns.Count > 1 Or xRg2.Areas.Count > 1 Then
        MsgBox "Multiple ranges or columns have been selected ", vbInformation, "Similarity finder"
        GoTo lTwo
    End If
    If xRg1.CountLarge <> xRg2.CountLarge Then
       MsgBox "Two selected ranges must have the same numbers of cells ", vbInformation, "Similarity finder"
       GoTo lTwo
    End If
    xDiffs = (MsgBox("Click Yes to highlight similarities, click No to highlight differences ", vbYesNo + vbQuestion, "Similarity finder") = vbNo)
    Application.ScreenUpdating = False
    
    xRg1.Font.ColorIndex = xlAutomatic
    xRg2.Font.ColorIndex = xlAutomatic
    
    Const Delimiter As String = "; "
    Dim dLen As Long: dLen = Len(Delimiter)
    Dim s() As String
    Dim d() As String
    Dim cString As String
    Dim n As Long
    Dim cStart As Long
    
    If Not xDiffs Then
        For I = 1 To xRg1.Count
            cStart = 1
            Set xCell1 = xRg1.Cells(I)
            Set xCell2 = xRg2.Cells(I)
            cString = Left(xCell1.Value, InStrRev(xCell1.Value, ";") - 1)
            s = Split(cString, Delimiter)
            cString = Left(xCell2.Value, InStrRev(xCell2.Value, ";") - 1)
            d = Split(cString, Delimiter)
            For n = 0 To UBound(d)
                If IsNumeric(Application.Match(d(n), s, 0)) Then
                    xCell2.Characters(cStart, Len(d(n))).Font.Color = vbRed
                End If
                cStart = cStart + Len(d(n)) + dLen
            Next n
        Next I
    Else
        For I = 1 To xRg1.Count
            cStart = 1
            Set xCell1 = xRg1.Cells(I)
            Set xCell2 = xRg2.Cells(I)
            cString = Left(xCell1.Value, InStrRev(xCell1.Value, ";") - 1)
            s = Split(cString, Delimiter)
            cString = Left(xCell2.Value, InStrRev(xCell2.Value, ";") - 1)
            d = Split(cString, Delimiter)
            For n = 0 To UBound(d)
                If IsError(Application.Match(d(n), s, 0)) Then
                    xCell2.Characters(cStart, Len(d(n))).Font.Color = vbRed
                End If
                cStart = cStart + Len(d(n)) + dLen
            Next n
        Next I
    End If
    
    Application.ScreenUpdating = True

End Sub
VBasic2008
  • 44,888
  • 5
  • 17
  • 28
  • I tried your code, it works partially. In rows where there are 2 numbers (example G3, H3 and G5, H5 so on ) only one of the numbers are getting highlighted, I will try to dig in further. Thanks a ton for the help :) – Pavan Alur Jan 06 '21 at 15:51
  • The reason is the first sentence in my answer. Usually delimited data doesn't end with a delimiter e.g.: `a,b,c,d`. I would suggest you consider getting rid of the trailing (last) delimiter. – VBasic2008 Jan 06 '21 at 18:15
  • I went ahead and modified Ron's code and it worked for me, Thanks so much for all the help :). Have a great day! – Pavan Alur Jan 06 '21 at 18:58
  • Sorry for bothering, but I edited the code to work now. Would appreciate the feedback. – VBasic2008 Jan 06 '21 at 19:41
  • Works perfectly. Thanks :) – Pavan Alur Jan 06 '21 at 20:07
1

The heart of your question is how to test for your notion of similarity.

Here is a function which, when given a string consisting of semicolon-delimited items, returns True if the items are the same, though possibly in a different order, and False otherwise. The key idea is to split on ";", sort the resulting array, then rejoin. This will put the strings in a standard sorted order, which can then be directly compared for equality:

Function Sort(A As Variant) As Variant
    Dim sorted As Variant, item As Variant
    Dim lb As Long, ub As Long
    Dim i As Long
    Dim items As Object
    Set items = CreateObject("System.Collections.ArrayList")
    
    lb = LBound(A)
    ub = UBound(A)
    ReDim sorted(lb To ub)
    
    For i = lb To ub
        item = A(i) 'Somewhat oddly, seems needed
        items.Add item
    Next i
    
    items.Sort
    
    For i = lb To ub
        sorted(i) = items(i - lb)
    Next i
    
    Sort = sorted
End Function

Function Similar(s1 As String, s2 As String) As Boolean
    Similar = (Join(Sort(Split(s1, ";")), ";") = Join(Sort(Split(s2, ";")), ";"))
End Function

For example, Similar("12;34;56","34;56;12") will evaluate to True but Similar("12;34;56","34;16;12") will evaluate to False.

In your case you have trailing semicolons so they are not being used as delimiters in a standard way. It turns out not to matter: Similar("12;34;56;","34;56;12;") still works as expected.

John Coleman
  • 51,337
  • 7
  • 54
  • 119
  • The problem is the inconsistency of the data: the strings in column `A` end with a space `" "` (or `"; "` if you will), while the strings in column `B` end with a semi-colon `;` and the delimiter is a semi-colon and a space `"; "`. The first you cannot see from the image. – VBasic2008 Jan 06 '21 at 19:08
  • @VBasic2008 I didn't catch that, but since OP's notion of similarity isn't quite what I thought it was, there is no point in tweaking the code. If OP wants to use `Similarity` as a UDF, they would just have to `Trim` it first. I mostly wrote the code since I wanted to experiment with an array-list based array sorter. The lack of a good built-in sort is one of the more annoying things in VBA. – John Coleman Jan 06 '21 at 19:13
  • Hi John, thanks for the reply, I used code by Ron, modified it a bit and it worked fine – Pavan Alur Jan 06 '21 at 20:04