1

This is a follow up question from How to remove duplicates that are case SENSITIVE in Excel (for 100k records or more)? .

Since his code procedure manipulates the data of column A only, I'd like to also delete the entire row of data if case-sensitive duplicate is found.

Case sensitive meaning:

  1. Case1
  2. case1
  3. cASE1

Are all unique records.

Community
  • 1
  • 1
compski
  • 709
  • 3
  • 13
  • 28
  • You can loop through the range, and when it finds a case sensitive duplicate, do say `Cells(i,1).EntireRow.Delete`. Note: You'll want to loop backwards, i.e. `For i = lastRow to 1 Step -1` – BruceWayne Apr 03 '17 at 16:50
  • 1
    @BruceWayne, surely not a good idea for 100k+ records? – CallumDA Apr 03 '17 at 16:52
  • @CallumDA good catch, I keep hitting this memory limit error (https://i.stack.imgur.com/PNRZN.png). So I think the only way is to use a dictionary or are there other ways? – compski Apr 03 '17 at 17:02
  • @compski, dictionary is probably a good way to go about it – CallumDA Apr 03 '17 at 17:04

1 Answers1

5

You can use a Dictionary to check for binary uniqueness and variant arrays to speed things up. To use the dictionary you will need to include a reference to Microsoft Scripting Runtime Library

(Tools > References > Microsoft Scripting Runtime library)

I've tested this with 100,000 rows which takes on average 0.25 seconds on my laptop.

Sub RemoveDuplicateRows()
    Dim data As Range
    Set data = ThisWorkbook.Worksheets("Sheet1").UsedRange

    Dim v As Variant, tags As Variant
    v = data
    ReDim tags(1 To UBound(v), 1 To 1)
    tags(1, 1) = 0 'keep the header

    Dim dict As Dictionary
    Set dict = New Dictionary
    dict.CompareMode = BinaryCompare

    Dim i As Long
    For i = LBound(v, 1) To UBound(v, 1)
        With dict
            If Not .Exists(v(i, 1)) Then 'v(i,1) comparing the values in the first column 
                tags(i, 1) = i
                .Add Key:=v(i, 1), Item:=vbNullString
            End If
        End With
    Next i

    Dim rngTags As Range
    Set rngTags = data.Columns(data.Columns.count + 1)
    rngTags.Value = tags

    Union(data, rngTags).Sort key1:=rngTags, Orientation:=xlTopToBottom, Header:=xlYes

    Dim count As Long
    count = rngTags.End(xlDown).Row

    rngTags.EntireColumn.Delete
    data.Resize(UBound(v, 1) - count + 1).Offset(count).EntireRow.Delete
End Sub

Based on the brilliant answer from this question

Community
  • 1
  • 1
CallumDA
  • 12,025
  • 6
  • 30
  • 52
  • Wow, unbelievably fast, any idea how to keep everything in an array? In other words, delete everything which is not in the array? – Lowpar Apr 03 '17 at 19:32
  • @Lowpar, sorry I don't quite understand? – CallumDA Apr 03 '17 at 20:20
  • for Dim dict As Dictionary. I got "User-defined type not defined". Hold on while I find out why I got this error – compski Apr 04 '17 at 02:03
  • In VBA, Tools > Reference > I had to enable Microsoft Scripting Runtime to get your code to work. Thanks your code worked wonderfully! – compski Apr 04 '17 at 02:37
  • @CallumDA the code would need to be modified to first insert the unique instances you would like to keep from the data into a dictionary, then all the lines would be compared to see if these lines contain the identifier you would like to keep and if not, delete the row. I will try and modify the code above to do that, currently, I use the autofiter method, and although it is quick, when the data is large, it can be very relatively slow. – Lowpar Apr 04 '17 at 06:28
  • @compski completely forgot to mention that you'd need to include the reference sorry! I'll update the answer so that there's no confusion for future visitors. Glad it worked for you :) – CallumDA Apr 04 '17 at 06:51
  • @Lowpar when you say "all the lines would be compared to see if these lines contain the identifier " it sounds like it would be *a lot* slower than this method. Crucially, this method only accesses the worksheet a few times >> 1. to pull in all the data on the sheet 2. to output the keys array as a column to the right of the data 3. to sort by key (meaning all unique rows at the top) 4. to delete all rows below the unique rows (in one go). Either way I'd be interested to see any potentially quicker method so please don't let me stop you if you're onto something great! – CallumDA Apr 04 '17 at 07:09
  • @CallumDA I get you, interestingly, now that we are discussing it, I wonder if a SQL left join on the list of unique identifiers you would like to include in the data would be faster? In essence, at the moment, I would filter the data by the unique identifiers I want in the data, I then copy and paste that information out and rename the sheet. It takes a couple of seconds, I was just wondering if there was a faster way. – Lowpar Apr 04 '17 at 12:35