0

Want to find duplicates in a column in excel and want to popup a msgbox upon finding even 1 duplicate and it shouldn't keep on popping messages if it finds more than one duplicate.

Also, if i can use two column cell values and use that together to find duplicates, this would be also helpful.

  Sub ColumnDuplicates()
    Dim lastRow As Long
    Dim matchFoundIndex As Long
    Dim iCntr As Long
    lastRow = Range("A65000").End(xlUp).Row

    For iCntr = 1 To lastRow
    If Cells(iCntr, 1) <> "" Then
        matchFoundIndex = WorksheetFunction.Match(Cells(iCntr, 1), Range("A1:A" & lastRow), 0)
        If iCntr <> matchFoundIndex Then
            MsgBox ("There are duplicates in Column A")
        End If
    End If
    Next
    MsgBox ("No Duplicates in Column A")
End Sub

Expecting to print message saying that column A has duplicates or does not have duplicates

Mathieu Guindon
  • 69,817
  • 8
  • 107
  • 235
Zubair
  • 19
  • 1
  • 6
  • Not a direct duplicate, but [this answer](https://stackoverflow.com/questions/47099413/find-duplicates-in-a-column?rq=1) talks about finding duplications in a column – realr Jul 26 '19 at 15:23
  • 1
    @calestini: I like the approach `.RemoveDuplicates` taken by the OP in that link. I would go ahead with that approach as it doesn't involv any loops. However instead of using a column in the same sheet, I will use a temp worksheet. – Siddharth Rout Jul 26 '19 at 15:34
  • 1
    But then that also depends if the user is using Excel 2007+... Looking at `"A65000"`, It is possible the user is still using Excel 2003. – Siddharth Rout Jul 26 '19 at 15:35
  • Which Excel version are you using? – Siddharth Rout Jul 26 '19 at 16:00
  • @SiddharthRout IMO `Rows.Count` (vs. hard-coding the "last row" ...which was 65,535 IIRC, not 65,000) should be non-negociable, regardless of how many rows your Excel sheets have ;-) – Mathieu Guindon Jul 26 '19 at 16:01
  • 1
    @MathieuGuindon I agree with you :) I have covered that in this [post](https://stackoverflow.com/questions/11169445/error-in-finding-last-used-cell-in-excel-with-vba/11169920#11169920) – Siddharth Rout Jul 26 '19 at 16:03
  • It may not be the best approach, but you could throw together a basic bubble sort and anytime a compare occurs just have an if statement to catch 2 values that are =. – SomeNerdAtWork Jul 26 '19 at 16:04

3 Answers3

2

What about the use of EVALUATE?

Public Sub Test()

With ThisWorkbook.Sheets("Sheet1")
    lr = .Cells(.Rows.Count, "A").End(xlUp).Row
    If .Evaluate("=Max(countif(A1:A" & lr & ",A1:A" & lr & "))") > 1 Then
        MsgBox "Duplicates!"
    Else
        MsgBox "No Duplicates!"
    End If
End With

End Sub

Or, parameterized:

Public Sub Test(ByVal sheet As Worksheet, ByVal columnHeading As String)

With sheet
    lr = .Cells(.Rows.Count, columnHeading).End(xlUp).Row
    If .Evaluate("=Max(countif(" & columnHeading & "1:" & columnHeading & lr & "," & columnHeading & "1:" & columnHeading & lr & "))") > 1 Then
        MsgBox "Duplicates!"
    Else
        MsgBox "No Duplicates!"
    End If
End With

End Sub

Now you can invoke it like this:

Test Sheet1, "A" ' find dupes in ThisWorkbook/Sheet1 in column A
Test Sheet2, "B" ' find dupes in ThisWorkbook/Sheet2 in column B
Test ActiveWorkbook.Worksheets("SomeSheet"), "Z" ' find dupes in "SomeSheet" worksheet of whatever workbook is currently active, in column Z
Mathieu Guindon
  • 69,817
  • 8
  • 107
  • 235
JvdV
  • 70,606
  • 8
  • 39
  • 70
  • 1
    ++ Beautiful... I see one hitch though.. As the number of rows increase the array formula will become slower :) Your code goes very slow for 160k rows... – Siddharth Rout Jul 26 '19 at 15:48
  • Thanks a ton for the answer! – Zubair Jul 26 '19 at 15:50
  • 1
    I think you can qualify the `Evaluate` with the `With` block worksheet, so that `Evaluate` operates in the context of that sheet (i.e. call `Worksheet.Evaluate` rather than `[Application.]Evaluate`), which would allow safely changing `Sheet1!A1:A` to simply `A1:A`. Not a fan of `Evaluate` though. – Mathieu Guindon Jul 26 '19 at 15:51
  • @SiddharthRout, I should have thought about that, time for coffee! – JvdV Jul 26 '19 at 15:51
  • Is there a way to check 4 columns(b,c,e,f) and take its values and compare it together across different rows in the same columns and generating a Msgbox for duplicate?@JudV – Zubair Jul 26 '19 at 15:58
  • 1
    This is not what your actual quesiton is. Please do not change the quesiton midway when lot of people have already suggested you solutions based on the original question @Zubair :) – Siddharth Rout Jul 26 '19 at 16:02
  • @SiddharthRout The answer does satisfy my original question, but if you look at my question above you can see that requirement also exists. Thanks! – Zubair Jul 26 '19 at 16:03
  • @Zubair make the procedure take a `ByVal columnHeading As String` parameter, and replace `A1:A" & lr` with `columnHeading & "1:" & columnHeading & lr` and `"A"` with `columnHeading`, then invoke it with the column heading you want to work with, e.g. `Test "A"` then `Test "B"` or `Test "XYZ"`. – Mathieu Guindon Jul 26 '19 at 16:06
  • @Zubair Thats for 2 columns. Nothing about checking across rows. The moment rows are involved the code will change drastically :) – Siddharth Rout Jul 26 '19 at 16:06
  • Thanks @MathieuGuindon – Zubair Jul 26 '19 at 16:08
  • @Zubair, maybe simplest is to check out @SiddhartRout his answer, you can copy `ws.Columns("B:E").Copy` and then remove duplicates on `Columns:=Array(1, 2, 3, 4)`. – JvdV Jul 26 '19 at 16:08
  • @JvdV: My code will not be applicable anymore as the user wants to do a horizontal check and not a vrtical check.... – Siddharth Rout Jul 26 '19 at 16:09
  • @Zubair I'd suggest marking an answer as accepted, and asking a new/separate question if you run into any specific problem making it work for your needs - Sid, I don't think OP meant a horizontal check is also needed, just that they have more than one single column to look for dupes in. – Mathieu Guindon Jul 26 '19 at 16:11
  • @MathieuGuindon: I am just going by this `Is there a way to check 4 columns(b,c,e,f) and take its values and compare it together across different rows in the same columns and generating a Msgbox for duplicate?@JudV – Zubair 12 mins ago` :) – Siddharth Rout Jul 26 '19 at 16:11
  • @SiddharthRout damn... in that case I'd just add a helper column that concatenates the 4 values, and look for dupes in the helper column. – Mathieu Guindon Jul 26 '19 at 16:12
  • @SiddharthRout, to me that actually worked on your code. OP want to combine the 4 columns and see if there are duplicate rows. Not sure what I'm missing when you say it wont – JvdV Jul 26 '19 at 16:13
  • @MathieuGuindon I am a beginner to VBA can you please show how it would fit in the code area. – Zubair Jul 26 '19 at 16:20
  • @Zubair see edit - I warmly encourage you to try and *understand* what the code does rather than just copy-pasting though. – Mathieu Guindon Jul 26 '19 at 16:26
  • Will do @MathieuGuindon! Thanks! – Zubair Jul 26 '19 at 17:53
1

Throw your values in a dictionary

Sub ColumnDuplicates()
Dim lastRow As Long
Dim matchFoundIndex As Long
Dim iCntr As Long

lastRow = Range("A65000").End(xlUp).Row
Set oDictionary = CreateObject("Scripting.Dictionary")
For iCntr = 1 To lastRow
    If Cells(iCntr, 1) <> "" Then
        If oDictionary.Exists(Cells(iCntr, 1).Value) Then
            MsgBox ("There are duplicates in Column A")
            Exit Sub
        Else 
            oDictionary.Add Cells(iCntr, 1).Value, Cells(iCntr, 1).Value
        End If
    End If
Next
MsgBox ("No Duplicates in Column A")
End Sub
Mathieu Guindon
  • 69,817
  • 8
  • 107
  • 235
Tim
  • 2,701
  • 3
  • 26
  • 47
  • Thanks for the answer tim. But it does not work if there are duplicates in Column A – Zubair Jul 26 '19 at 15:33
  • @Zubair care to elaborate what "does not work" means? `Dictionary` keys are, by definition, unique. – Mathieu Guindon Jul 26 '19 at 15:36
  • @MathieuGuindon When you put same values in multiple rows in column A, it does not detect duplicates. – Zubair Jul 26 '19 at 15:37
  • 1
    @Zubair try it again, this time storing the *values* rather than the cell objects themselves. Works here. – Mathieu Guindon Jul 26 '19 at 15:46
  • 1
    FWIW this under-appreciated answer is the only one that bails out as soon as a duplicate is known to exist, without traversing the entire set of rows; one million rows with a duplicate at row 12 would only process 12 rows; the `Evaluate`/`WorksheetFunction` solutions traverse the entire set *then* determine whether there are dupes. – Mathieu Guindon Jul 26 '19 at 15:56
1

If you have Excel 2007+ then this will be faster. This code ran in 1 sec for 200k rows

Sub Sample()
    Debug.Print Now

    Dim ws As Worksheet
    Dim wsTemp As Worksheet

    Set ws = Sheet1

    Set wsTemp = ThisWorkbook.Sheets.Add

    ws.Columns(1).Copy wsTemp.Columns(1)

    wsTemp.Columns(1).RemoveDuplicates Columns:=1, Header:=xlNo

    If Application.WorksheetFunction.CountA(ws.Columns(1)) <> _
       Application.WorksheetFunction.CountA(wsTemp.Columns(1)) Then
        Debug.Print "There are duplicates in Col A"
    Else
        Debug.Print "duplicates found in Col A"
    End If

    Application.DisplayAlerts = False
    wsTemp.Delete
    Application.DisplayAlerts = True

    Debug.Print Now
End Sub

I used the below code to generate 200k records in Col A

Sub GenerateSampleData()
    Range("A1:A200000").Formula = "=Row()"
    Range("A1:A200000").Value = Range("A1:A200000").Value
    Range("A10000:A20000").Value = Range("A20000:A30000").Value
End Sub

Code execution

enter image description here

Siddharth Rout
  • 147,039
  • 17
  • 206
  • 250