2

I have a spreadsheet with a column called NumberID that has about 50k records. I am aware that there are duplicates however with scrolling up/down it takes forever to find anything plus often times excel is being somewhat slow. I'm trying to write a quick snippet of code to be able to find and count the number of duplicates.

I'm trying to write a quick way of doing it, basically my data is from rows 20 to 48210 and I'm trying to find a number total duplicate records.

Dim lastRow As Long
Dim matchFoundIndex As Long
Dim iCntr As Long
Dim count As Long
count = 0
lastRow = Range("B48210").End(xlUp).Row
For iCntr = 1 To lastRow
    If Cells(iCntr, 1) <> "" Then
       matchFoundIndex = WorksheetFunction.Match(Cells(iCntr, 1), Range("B20:B" & lastRow), 0)
        If iCntr <> matchFoundIndex Then
            count = count + 1
        End If
     End If
Next

MsgBox count

Here im getting an error on = WorkSheetFunction.Match - i found that this property can be used to accomplish what I'm trying to do. The error says

Unable to get the match property for the worksheetfunction class.

Someone have an idea? My vba has been rusty

brettdj
  • 54,857
  • 16
  • 114
  • 177
BobSki
  • 1,531
  • 2
  • 25
  • 61
  • If your only problem is that matching error, this may be a duplicate of http://stackoverflow.com/questions/17751443/excel-vba-cant-get-a-match-error-unable-to-get-the-match-property-of-the-wor – TJ Rockefeller Nov 29 '16 at 17:03
  • @TJRockefeller - does the code look sound otherwise – BobSki Nov 29 '16 at 17:05
  • 1
    everything you show you are using column B, but your reference in the first criterion of the match is using column A. I would suggest changing both `Cells(iCntr, 1)` to `Cells(iCntr, 2)` – Scott Craner Nov 29 '16 at 17:08
  • @Bobski I'm not sure what you are saying in that last comment, but yes the use case is different, but the answer to the question that I put a link I think is essentially the answer that you are looking for. You need to use Application.Match so that you can handle cases where a match is not found. – TJ Rockefeller Nov 29 '16 at 17:08
  • I don't kno how ot incorporate it into my code - the question shows a solution to a same problem but different scenario – BobSki Nov 29 '16 at 17:08
  • Now it's showing me 19 as LASTROW but it's supposed to be 48210 ahhhhhh – BobSki Nov 29 '16 at 17:19

3 Answers3

3

Using Match for this is incredibly inefficient that many rows. I'd fill a Dictionary with found items and just test to see if you've seen them before:

'Add a reference to Microsoft Scripting Runtime.
Public Sub DupCount()
    Dim count As Long
    With New Scripting.Dictionary
        Dim lastRow As Long
        lastRow = Range("B48210").End(xlUp).Row
        Dim i As Long
        For i = 1 To lastRow
            Dim test As Variant
            test = Cells(i, 2).Value
            If IsError(test) Then
            ElseIf test <> vbNullString Then
                If .Exists(test) Then
                    count = count + 1
                Else
                    .Add test, vbNull
                End If
            End If
        Next
    End With
    MsgBox count
End Sub
Comintern
  • 21,855
  • 5
  • 33
  • 80
  • For some reason lastrow =19 but there are a lot of rows the data actually starts at row 20 and goes to 48210 – BobSki Nov 29 '16 at 18:03
  • @Bobski - `Range` and `Cells` are unqualified in this example. If you're running this from a module they may not be referring to the correct worksheet, so you should fully qualify them. Otherwise, see [Error in finding last used cell in VBA](http://stackoverflow.com/q/11169445/4088852). – Comintern Nov 29 '16 at 18:19
2

since you want to "count the number of duplicates", a very fast way of doing that is exploiting RemoveDuplicates() method of Range object, like follows:

Option Explicit

Sub main()
    Dim helperCol As Range
    Dim count As Long

    With Worksheets("IDs") '<--| reference your relevant sheet (change "IDs" to youtr actual sheet name)
        Set helperCol = .UsedRange.Resize(, 1).Offset(, .UsedRange.Columns.count) '<--| set a "helper" range where to store unique identifiers
        With .Range("A1", .Cells(.Rows.count, 1).End(xlUp)) '<-- reference "IDs" column from row 1 (header) to last not empty cell
            helperCol.Value = .Value '<--| copy identifiers to "helper" range
            helperCol.RemoveDuplicates Columns:=1, Header:=xlYes '<--| remove duplicates in copied identifiers
            count = .SpecialCells(xlCellTypeConstants).count - helperCol.SpecialCells(xlCellTypeConstants).count '<--| count duplicates as the difference between original IDs number and unique ones
        End With
        helperCol.ClearContents '<--| clear "helper" range
    End With
    MsgBox count & " duplicates"
End Sub
user3598756
  • 28,893
  • 4
  • 18
  • 28
  • @Bobski, did you try this? – user3598756 Nov 29 '16 at 19:03
  • yes it gave me a very big number something in the 96k range - i think it took count of all the records * 2 – BobSki Nov 29 '16 at 20:22
  • Well,I had tested it with some 50k rows in column A with a predefined number of duplicates (just repeated many times a 10 cell pattern) and it worked. Try stepping through code and see what gets done in the worksheet along with querying immediate window (?helperCol.Address. Or ?helperCol.Count) . – user3598756 Nov 30 '16 at 13:24
2

You can use my Duplicate Masteer addin to do this.

It offers a fast array method to deal with duplicates.

  • counting
  • deleting
  • selecting

It goes beyond the built-in features of Excel as it allows duplicate matching on a

  1. case insentitive basis
  2. ignoring whitespace
  3. even RegexP matching
  4. runs over multiple sheets

enter image description here

brettdj
  • 54,857
  • 16
  • 114
  • 177