0

I have large datasets that require duplicates be removed. The data has a column that contains an ID number - I want to loop through this column for duplicate IDs. If a duplicate is present I want code to delete the duplicate.

The datasets I am using always have the same columns - but the number of rows changes there for I will use:

Do While Cells(b,4).Value <> ""

Then within this loop I want an If loop that finds duplicates and deletes them - how would be best to do this?

Turner
  • 15
  • 5
  • Sounds like you're reinventing the wheel. https://learn.microsoft.com/en-us/office/vba/api/excel.range.removeduplicates – BigBen Aug 17 '20 at 14:41
  • 2
    use remove duplicates, its on the data tab. – Scott Craner Aug 17 '20 at 14:41
  • @BigBen Very true I think this could work - however, as I can't define the range would you suggest I still use a Do While loop? – Turner Aug 17 '20 at 14:57
  • 1
    My guess is that you need to [find the last row](https://stackoverflow.com/questions/11169445/error-in-finding-last-used-cell-in-excel-with-vba). – BigBen Aug 17 '20 at 14:58
  • @ScottCraner Maybe I should have been clearer this is going to be part of a tool - the user will press a button and the code will clean the data (part of that clean is removing duplicates) – Turner Aug 17 '20 at 14:58

1 Answers1

1

As mentioned by Scott Craner, there's a basic Excel feature handling this.

My Excel sheet looked as follows:

Col_D Col_E
    1     1
    1     1
    1     2
    1     2
    1     3
    2     1
    2     2
    2     2
    2     2
    2     3

Recording the "remove duplicates" from the "Data" tab, yields following VBA command:

ActiveSheet.Range("$D$1:$E$11").RemoveDuplicates Columns:=Array(1, 2), Header :=xlYes

Meaning the following:

Range("$D$1:$E$11")  : Remove the duplicates from that range
Columns:=Array(1, 2) : Both column 1 (D) and 2 (E) need to be taken into account
                         (the duplicates of the combination of both columns)
Header :=xlYes       : A header row is present

The result is:

Col_A Col_B
    1     1
    1     2
    1     3
    2     1
    2     2
    2     3
Dominique
  • 16,450
  • 15
  • 56
  • 112
  • Cheers - I will be using the Remove Duplicate function within my code the only issue i'm trying to over come is how to apply this function to a dataset where the final row is not defined in the code. I'm thinking I need to use: i = .Range("E" & .Rows.Count).End(xlUp).Row ActiveSheet.Range("A" & i) – Turner Aug 17 '20 at 15:10