11

I am new to VBA. I have job in my hand to improve performance of VBA code. To improve performance of the code, I have to read entire row and compare it with another row. Is there any way to do this in VBA?

Pseudocode:

sheet1_row1=read row1 from sheet1
sheet2_row1=read row1 from sheet2
if sheet1_row1 = sheet2_row1 then
      print "Row contains same value"
else
      print "Row contains diff value"
end if
Jean-François Corbett
  • 37,420
  • 30
  • 139
  • 188
Vicky
  • 1,215
  • 6
  • 22
  • 45
  • 1
    I can give you the code but first lets see if you can put in some efforts :) Try searching SO or Google on how to store Excel Range in an Array and work with them. After that if you are still stuck, post the code that you tried and we will take it from there. Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: [Stack Overflow question checklist](http://meta.stackexchange.com/questions/156810/stack-overflow-question-checklist) – Siddharth Rout Oct 16 '13 at 05:32
  • @Siddharth- For your kind information, i have done two excel sheet comparision by checking each cell value. If you believe me, you can check my question list. I have posted the question for performance improvement(check in my questions: what is best way to compare excel sheets?). I was looking for little help. – Vicky Oct 16 '13 at 06:17

11 Answers11

35
Sub checkit()
Dim a As Application
Set a = Application
MsgBox Join(a.Transpose(a.Transpose(ActiveSheet.Rows(1).Value)), Chr(0)) = _
       Join(a.Transpose(a.Transpose(ActiveSheet.Rows(2).Value)), Chr(0))

End Sub

What's going on:

  • a is just shorthand for Application to keep the code below easier to read
  • ActiveSheet.Rows(1).Value returns a 2-D array with dimensions (1 to 1, 1 to {number of columns in a worksheet})
  • We'd like to condense the array above into a single value using Join(), so we can compare it with a different array from the second row. However, Join() only works on 1-D arrays, so we run the array twice through Application.Transpose(). Note: if you were comparing columns instead of rows then you'd only need one pass through Transpose().
  • Applying Join() to the array gives us a single string where the original cell values are separated by a "null character" (Chr(0)): we select this since it's unlikely to be present in any of the cell values themselves.
  • After this we now have two regular strings which are easily compared

Note: as pointed out by Reafidy in the comments, Transpose() can't handle arrays with more than approx. 65,000 elements, so you can't use this approach to compare two whole columns in versions of Excel where sheets have more than this number of rows (i.e. any non-ancient version).

Note 2: this method has quite bad performance compared to a loop used on a variant array of data read from the worksheet. If you're going to do a row-by-row comparison over a large number of rows, then the approach above will be much slower.

Tim Williams
  • 154,628
  • 8
  • 97
  • 125
  • @Tim-Thank you so much for your reply. Can you provide little information about this code? – Vicky Oct 16 '13 at 06:22
  • 3
    @Jean-FrançoisCorbett: `Chr(0)` is the NULL character. It is usually used as a delimiter in strings. However, it is much better to use the inbuilt constant `vbNullChar` instead of `Chr(0)` as it doesn't equire runtime evaluation. – Siddharth Rout Oct 16 '13 at 07:33
  • 2
    @Vicky - `Join` requires working with a 1-D array: the code is getting a 2-D array from the row, so two passes through `Transpose` converts it to 1-D. Chr(0) is used as a joining character because it's very unlikely to be in any of the cells on your sheet. – Tim Williams Oct 16 '13 at 16:07
  • 2
    +1, Note: beware of the limitations of working with arrays in vba and the transpose limit of 2^16 (65536 rows). – Reafidy Nov 25 '13 at 22:23
  • 4
    @Tim: Maybe you need to convert this amazing answer to a full fledged answer (Sort of FAQ) :) Explaining what exactly is happening behind the scenes. This is the 2nd time I have to refer this answer to other users :) [First Time](http://stackoverflow.com/questions/19671951/highlight-duplicate-rows-entire-row-vs-entire-row) [Second Time](http://stackoverflow.com/questions/21481753/excel-vba-compare-a-range-of-cells-to-a-range-of-cells#21481753) – Siddharth Rout Jan 31 '14 at 14:01
  • Note that if you're comparing columns, you just need one less `Transpose()`. – Geoff May 19 '14 at 19:29
  • I see this is a popular answer, but I must dissent. Piercing the barrier between VBA and Excel four times is NOT a good solution, when one trip is all that is necessary. – Excel Hero Jul 17 '15 at 04:35
  • 2
    it's just intended as a "trick" - performance does indeed suck: it's about 30-50x worse than a regular loop after picking up the values in an array. – Tim Williams Jul 17 '15 at 05:33
  • Its imaginative though - which is good to see when there is so much sameness in both questions and answers. – brettdj Sep 03 '15 at 09:51
13

For your specific example, here are two ways...

Case Insensitive:

MsgBox [and(1:1=2:2)]

Case Sensitive:

MsgBox [and(exact(1:1,2:2))]

...

Below are generalized functions to compare any two contiguous ranges.

Case Insensitive:

Public Function RangesEqual(r1 As Range, r2 As Range) As Boolean
    RangesEqual = Evaluate("and(" & r1.Address & "=" & r2.Address & ")")
End Function

Case Sensitive:

Public Function RangesEqual(r1 As Range, r2 As Range) As Boolean
    RangesEqual = Evaluate("and(exact(" & r1.Address & "," & r2.Address & "))")
End Function
Excel Hero
  • 14,253
  • 4
  • 33
  • 40
  • For those wondering what is this `[ x ]` notation, it's the same as `Evaluate(" x ")` – ZygD Dec 29 '20 at 13:37
  • The square brackets do indeed invoke the Evaluate function, but there is a big difference. With the square brackets, the content between the brackets must be a literal text string... not an expression... and the literal string cannot be enclosed in quotation marks. – Excel Hero Jun 04 '21 at 21:15
6

OK, this ought to be fairly fast: minimal interaction between Excel UI and VBA (which is where much of the slowness lives). Assumes worksheets have similar layouts from $A$1 and that we're only going to attempt to match the common area of the UsedRanges for the two sheets:

Public Sub CompareSheets(wks1 As Worksheet, wks2 As Worksheet)

Dim rowsToCompare As Long, colsToCompare As Long    
    rowsToCompare = CheckCount(wks1.UsedRange.Rows.Count, wks2.UsedRange.Rows.Count, "Row")
    colsToCompare = CheckCount(wks1.UsedRange.Columns.Count, wks2.UsedRange.Columns.Count, "Column")    
    CompareRows wks1, wks2, rowsToCompare, colsToCompare

End Sub

Private Function CheckCount(count1 As Long, count2 As Long, which As String) As Long
    If count1 <> count2 Then
        Debug.Print "UsedRange " & which & " counts differ: " _
            & count1 & " <> " & count2
    End If
    CheckCount = count2
    If count1 < count2 Then
        CheckCount = count1
    End If        
End Function

Private Sub CompareRows(wks1 As Worksheet, wks2 As Worksheet, rowCount As Long, colCount As Long)
    Debug.Print "Comparing first " & rowCount & " rows & " & colCount & " columns..."        
Dim arr1, arr2
    arr1 = wks1.Cells(1, 1).Resize(rowCount, colCount).Value
    arr2 = wks2.Cells(1, 1).Resize(rowCount, colCount).Value
Dim rIdx As Long, cIdx As Long    
    For rIdx = LBound(arr1, 1) To UBound(arr1, 1)
        For cIdx = LBound(arr1, 2) To UBound(arr1, 2)
            If arr1(rIdx, cIdx) <> arr2(rIdx, cIdx) Then
                Debug.Print "(" & rIdx & "," & cIdx & "): " & arr1(rIdx, cIdx) & " <> " & arr2(rIdx, cIdx)
            End If
        Next
    Next
End Sub
Mike Woodhouse
  • 51,832
  • 12
  • 88
  • 127
  • Yup. Picking up the range in one go makes a huge difference to the speed of your code. It's dismaying that we still have to keep pointing this out... – Nigel Heffernan Aug 24 '17 at 17:45
2

Excel 2016 has a built in function called TEXTJOIN

https://support.office.com/en-us/article/textjoin-function-357b449a-ec91-49d0-80c3-0e8fc845691c

Looking at @Tim Williams answer and using this new function (which does not have the 65536 row limit):

Sub checkit()
    MsgBox WorksheetFunction.TextJoin(Chr(0), False, ActiveSheet.Rows(1).Value) = _
           WorksheetFunction.TextJoin(Chr(0), False, ActiveSheet.Rows(2).Value)
End Sub

Written as a function:

Public Function CheckRangeValsEqual(ByVal r1 As Range, ByVal r2 As Range, Optional ByVal strJoinOn As String = vbNullString) As Boolean
    CheckRangeValsEqual = WorksheetFunction.TextJoin(strJoinOn, False, r1.Value) = _
                          WorksheetFunction.TextJoin(strJoinOn, False, r2.Value)
End Function
1
Match = True

Row1length = Worksheets("Sheet1").Cells(1, Columns.Count).End(xlToLeft).Column
Row2length = Worksheets("Sheet2").Cells(1, Columns.Count).End(xlToLeft).Column

If Row1length <> Row2length Then
    'Not equal
    Match = False
Else
    For i = 1 To Row1length
        If Worksheets("Sheet1").Cells(1, i),Value <> Worksheets("Sheet2").Cells(1, i) Then
            Match = False
            Exit For
        End If
    Next
End If

If Match = True Then
    Debug.Print "match"
Else
    Debug.Print "not match"
End If
Siddharth Rout
  • 147,039
  • 17
  • 206
  • 250
chiliNUT
  • 18,989
  • 14
  • 66
  • 106
1

Here's a bit of code that will do two vector ranges. You can run it against two rows, two columns.

Don't think it's as fast as the x2 transpose method, but it's more flexible. The column invocation takes a bit longer since there are 1M items to compare!

Option Explicit

Public Sub Test()
    'Check two columns
    Debug.Print DataAreasAreSame(Columns("a"), Columns("b"))
    'Check two rows
    Debug.Print DataAreasAreSame(Rows(1), Rows(2))
End Sub

Public Function DataAreasAreSame(ByVal DataArea1 As Range, ByVal     DataArea2 As Range) As Boolean
    Dim sFormula As String
    sFormula = "=SUM(If(EXACT(" & DataArea1.Address & "," &       DataArea2.Address & ")=TRUE,0,1))"
    If Application.Evaluate(sFormula) = 0 Then DataAreasAreSame = True
End Function
freedomn-m
  • 27,664
  • 8
  • 35
  • 57
PaulG
  • 1,051
  • 7
  • 9
0

=EXACT(B2;D2) formula and drag down, best option for me.

q3kep
  • 389
  • 3
  • 4
  • But, it does between only specific cell of a row with another! Not **to read entire row and compare it with another row** as mentioned in question! @q3kep – Nawshad Rehan Rasha Sep 23 '18 at 09:33
0

I'll put in a sledgehammer-to-crack-a-nut answer here, for completeness, because the question 'Are these two ranges identical?' is turning up as an unexamined component of everyone else's 'compare my ranges and then do this complicated thing...' questions.

Your question is a simple question about small ranges. My answer is for large ones; but the question is a good one, and a good place for a more general answer, because it's simple and clear: and 'Do these ranges differ?' and 'Has someone tampered with my data?' are relevant to most commercial Excel users.

Most of the answers to the typical 'compare my rows' questions are cell-by-cell reads and comparisons in VBA. The simplicity of these answers is commendable, but this approach performs very slowly on a large data sets because:

  1. Reading a range one cell at a time is very slow;
  2. Comparing values pair-by-pair is inefficient, especially for strings, when the number of values gets into the tens of thousands,
Point(1) is the important one: it takes the same amount of time for VBA to pick up a single cell using var = Range("A1") as it does to pick up the entire range in one go using var = Range("A1:Z1024")...

...And every interaction with the sheet takes four times as much time as a string comparison in VBA, and twenty times longer than an comparison between floating-point decimals; and that, in turn, is three times longer than an integer comparison.

So your code will probably be four times faster, and possibly a hundred times faster, if you read the entire range in one go, and work on the Range.Value2 array in VBA.

That's in Office 2010 and 2013 (I tested them); for older version of Excel, you'll see quoted times between 1/50th and 1/500th of a second, for each VBA interaction with a cell or range of cells. That'll be way slower because, in both old and new versions of Excel, the VBA actions will still be in single-digit numbers of microseconds: your code will run at least a hundred times faster, and probably thousands of times faster, if you avoid cell-by-cell reads from the sheet in older versions of Excel.


arr1  = Range1.Values
arr2  = Range2.Values
' Consider checking that the two ranges are the same size ' And definitely check that they aren't single-cell ranges, ' which return a scalar variable, not an array, from .Value2
' WARNING: THIS CODE WILL FAIL IF YOUR RANGE CONTAINS AN ERROR VALUE
For i = LBound(arr1, 1) To Ubound(arr1, 2)
For j = LBound(arr1, 2) To Ubound(arr1, 2)
If arr1(i, j) <> arr2(i, j) Then bMatchFail = True Exit For End If
Next j
If bMatchFail Then Exit For
Next i
Erase arr1 Erase arr2

You'll notice that this code sample is generic, for two ranges of the same size taken from anywhere - even from separate workbooks. If you're comparing two adjacent columns, loading a single array of two columns and comparing IF arrX(i, 1) <> arrX(i,2) Then is going to halve the runtime.

Your next challenge is only relevant if you're picking up tens of thousands of values from large ranges: there's no performance gain in this extended answer for anything smaller than that.

What we're doing is:

Using a hash function to compare the values of two large ranges

The idea is very simple, although the underlying mathematics is quite challenging for non-mathematicians: rather than comparing one value at a time, we run a mathematical function that 'hashes' the values into a short identifier for easy comparison.

If you're repeatedly comparing ranges against a 'reference' copy, you can store the 'reference' hash, and this halves the workload.

There are some fast and reliable hashing functions out there, and they are available in Windows as part of the security and cryptography API. There is a slight problem in that they run on strings, and we have an array to work on; but you can easily find a fast 'Join2D' function that gets a string from the 2D arrays returned by a range's .Value2 property.

So a fast comparison function for two large ranges will look like this:

Public Function RangeCompare(Range1 as Excel.Range, Range2 As Excel.Range) AS Boolean
' Returns TRUE if the ranges are identical.
' This function is case-sensitive.
' For ranges with fewer than ~1000 cells, cell-by-cell comparison is faster
' WARNING: This function will fail if your range contains error values.
RangeCompare = False
If Range1.Cells.Count &LT;&GT; Range2.Cells.Count Then RangeCompare = False ElseIf Range1.Cells.Count = 1 then RangeCompare = Range1.Value2 = Range2.Value2 Else RangeCompare = MD5(Join2D(Range1.Value2)) = MD5(Join2D(Range2.Value2)) Endif
End Function

I've wrapped the Windows System.Security MD5 hash in this VBA function:

Public Function MD5(arrBytes() As Byte) As String
' Return an MD5 hash for any string
' Author: Nigel Heffernan Excellerando.Blogspot.com
' Note the type pun: you can pass in a string, there's no type conversion or cast ' because a string is stored as a Byte array and VBA recognises this.
oMD5 As Object 'Set a reference to mscorlib 4.0 to use early binding

Dim HashBytes() As Byte Dim i As Integer

Set oMD5 = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider") HashBytes = oMD5.ComputeHash_2((arrBytes))
For i = LBound(HashBytes) To UBound(HashBytes) MD5 = MD5 & Right("00" & Hex(HashBytes(i)), 2) Next i

Set oMD5 = Nothing ' if you're doing this repeatedly, declare at module level and persist Erase HashBytes

End Function
There are other VBA implementations out there, but nobody seems to know about the Byte Array / String type pun - they are not equivalent, they are identical - so everyone codes up unnecessary type conversions.

A fast and simple Join2D function was posted by Dick Kusleika on Daily Dose of Excel in 2015:

Public Function Join2D(ByVal vArray As Variant, Optional ByVal sWordDelim As String = " ", Optional ByVal sLineDelim As String = vbNewLine) As String

    Dim i As Long, j As Long
    Dim aReturn() As String
    Dim aLine() As String

    ReDim aReturn(LBound(vArray, 1) To UBound(vArray, 1))
    ReDim aLine(LBound(vArray, 2) To UBound(vArray, 2))

    For i = LBound(vArray, 1) To UBound(vArray, 1)
        For j = LBound(vArray, 2) To UBound(vArray, 2)
            'Put the current line into a 1d array
            aLine(j) = vArray(i, j)
        Next j
        'Join the current line into a 1d array
        aReturn(i) = Join(aLine, sWordDelim)
    Next i

    Join2D = Join(aReturn, sLineDelim)

End Function

If you need to excise blank rows before you make the comparison, you'll need the Join2D function I posted in StackOverflow back in 2012.

The most common application of this type of hash comparison is for spreadsheet control - change monitoring - and you'll see Range1.Formula used instead of Range1.Value2: but your question is about comparing values, not formulae.

Footnote: I've posted a very similar answer elsewhere. I'd've posted it here first if I'd seen this question earlier.

Nigel Heffernan
  • 4,636
  • 37
  • 41
0

If you want to do this in MS excel, you can do the following.

For example, you have column range of each row from "A" to "F" and have to compare between Row 2 and Row 3. To check entire row and compare it with another row we can specify this in formula in a new Result column and instead of pressing Enter after typing the formula, press Ctrl + Shift + Enter.

=AND(EXACT(A2:F2,A3:F3))

The result will be TRUE if they match and FALSE if they don't. You'll see curly braces around your formula if you've correctly entered it as an array formula. After this, drag down every row so that each cell of this Result Column will have comparison result between this row and the following!

0

I know there are already answers here, but here is a simple VBA-only function that compares the values in any two ranges, returning TRUE if they match, or the first non-matching item number if they don't. (It returns FALSE if the ranges do not have the same number of cells.)

Function RangesEqualItemNo(Range1 As Range, Range2 As Range) As Variant

    Dim CellCount As Long

    If Range1.Count = Range2.Count Then

        For CellCount = 1 To Range1.Cells.Count
            If Range1.Cells.item(CellCount).Value <> Range2.Cells.item(CellCount).Value Then
                RangesEqualItemNo = CellCount
                Exit Function
            End If
        Next CellCount

        RangesEqualItemNo = True

    Else
        RangesEqualItemNo = False

    End If

End Function

Or as a simple boolean function:

Function RangesEqual(Range1 As Range, Range2 As Range) As Boolean

    Dim CellCount As Long

    If Range1.Count = Range2.Count Then

        For CellCount = 1 To Range1.Cells.Count
            If Range1.Cells.item(CellCount).Value <> Range2.Cells.item(CellCount).Value Then
                RangesEqual = False
                Exit Function
            End If
        Next CellCount

        RangesEqual = True

    Else
        RangesEqual = False

    End If

End Function

Although this may not be fancy, this sort of brute-force approach is often the fastest.

This compares values, so it will automatically transpose between columns and rows, which may or may not be what you want.

To take this to the logical next step, the following function will return an array of each item number that is different.

Function RangeDiffItems(Range1 As Range, Range2 As Range, Optional DiffSizes As Boolean = False) As Long()

    Dim CellCount As Long
    Dim DiffItems() As Long
    Dim DiffCount As Long

    ReDim DiffItems(1 To Range1.Count)

    DiffCount = 0

    If Range1.Count = Range2.Count Or DiffSizes Then

        For CellCount = 1 To Range1.Cells.Count
            If Range1.Cells.item(CellCount).Value <> Range2.Cells.item(CellCount).Value Then
                DiffCount = DiffCount + 1
                DiffItems(DiffCount) = CellCount
            End If
        Next CellCount

        If DiffCount = 0 Then DiffItems(1) = 0

    Else
        DiffItems(1) = -1
    End If

    If DiffCount = 0 Then ReDim Preserve DiffItems(1 To 1) Else ReDim Preserve DiffItems(1 To DiffCount)

    RangeDiffItems = DiffItems

End Function

If there are no differences, it returns a 0 in the first array slot, or if the arrays are of different sizes, it returns a -1 for the first array spot. To allow it to compare arrays of different sizes, optionally enter TRUE for the third parameter.

There are also a few more answers to this question elsewhere.

sdanse
  • 151
  • 4
-1

In my version, I do not declare anything (Dim). It might be wrong but for all the years I have used Excel VBA - mostly it never stopped the code from working. In this case, the range is given as an area. If not, then you should use Dim Range1 As Range, etc. In my code, the area must be the same witch it mostly is anyway, the same rows and columns. This here is in my opinion the shortest way:

Set Range1 = Range("A1:B5")
Set Range2 = Range("D1:E5")
'Range1
For Each rng In Range1 'Selection
st1 = st1 & rng & ","
Next rng
'Range2
For Each rng In Range2 'Selection
st2 = st2 & rng & ","
Next rng
    'compare
    If st1 = st2 Then
        MsgBox "the same"
    Else
        MsgBox "different"
    End If

This other example is a bit longer, it reads the rows and columns and uses this for both areas. If you set a range then range(1) will still be the first cell in this range from the top. Here is my code:

Sub COMPARE()
    Set Range1 = Range("A1:B5")
    Set Range2 = Range("D1:E5")
    'area
    coly = Range1.Columns.Count 'columns
    rowy = Range1.Rows.Count 'rows
    
    For i = 1 To coly
        For j = 1 To rowy
            st1 = st1 & Range1(i, j)
            st2 = st2 & Range2(i, j)
        Next j
    Next i
    'compare
    If st1 = st2 Then
        MsgBox "the same"
    Else
        MsgBox "different"
    End If
End Sub
  • 2
    _"I do not declare anything (Dim). It might be wrong but for all the years I have used Excel VBA - mostly it never stopped the code from working"_ You're either _very_ lucky or you've never written a script with more than a few dozen lines of code. First of all, if you ever make a typo in the name of one of your variables while using it, it can go unnoticed for a long time, producing unexpected results; and if you ever notice, it can be very difficult to find the source of the bug in a long script. Not to mention that all your variables are of type `Variant` (which has a performance overhead). – 41686d6564 stands w. Palestine Nov 11 '21 at 10:33
  • On the other hand, if you declare your variables (and have `Option Explicit` enabled), the compiler will scream at you as soon as you attempt to compile the code. Think about how much time that could save you. – 41686d6564 stands w. Palestine Nov 11 '21 at 10:37
  • Please note that this is very inefficient as it reads the cell values one-by-one. Much faster is to read the data into a `Variant` array and loop the array, as has been already demonstrated in this thread. – BigBen Nov 11 '21 at 19:01
  • I have written really difficult projects in excel. This is the faster way for a small project. I do not use Option Explicit - then you can do it like this. Yes, I know how OE is useful for not making some errors, but it also makes you a lot more work and time to declare everything. So - this may not be the correct way but it is also not the wrong way. If you learn VBA then all the time you use Dim to declare variables - it does not mean that you must use it. Is my code slower? Yes, if there is a lot of data. But works great on small data. – Andre Allese Nov 12 '21 at 12:26
  • No it's still the slower way, even for a small project. Also not using `Option Explicit` is a terrible recommendation. In my opinion, this answer adds nothing of value and should be deleted. – BigBen Nov 12 '21 at 15:07
  • So here is no room for alternative ideas? Only by the book? That´s sad... – Andre Allese Nov 13 '21 at 11:03