0

Look to amend a brilliant answer from here posted below. However the answer below compares only the values in the first column for each row to tag and then delete.

However I want to look if the first column is identical and if so check all the other columns are identical and then tag it if the whole row exists.

 tried amending the 
       IF Not .Exists(v(i,1)) Then to 
       IF Not .Exists(v(i,1)) and IF Not .Exists(v(i,2)) Then

did not work also tried

   IF Not .Exists(v(i,1)) Then
    IF Not .Exists(v(i,2)) Then

Sub RemoveDuplicateRows()

Dim data As Range
Set data = ThisWorkbook.Worksheets("Sheet3").UsedRange

Dim v As Variant, tags As Variant
v = data
ReDim tags(1 To UBound(v), 1 To 1)
tags(1, 1) = 0 'keep the header

Dim dict As Dictionary
Set dict = New Dictionary
dict.CompareMode = BinaryCompare

Dim i As Long
For i = LBound(v, 1) To UBound(v, 1)
    With dict
        If Not .Exists(v(i, 1 And 2)) Then 'v(i,1) comparing the values in the first column
              tags(i, 1) = i
            .Add Key:=v(i, 1), Item:=vbNullString
         End If
      End With
Next i

Dim rngTags As Range
Set rngTags = data.Columns(data.Columns.count + 1)
rngTags.Value = tags

Union(data, rngTags).Sort key1:=rngTags, Orientation:=xlTopToBottom, Header:=xlYes

Dim count As Long
count = rngTags.End(xlDown).Row

rngTags.EntireColumn.Delete
data.Resize(UBound(v, 1) - count + 1).Offset(count).EntireRow.Delete

End Sub
BigBen
  • 46,229
  • 7
  • 24
  • 40
  • Fails to run, subscript out of range on If Not .Exists(v(i, 1 And 2)) because 1 And 2 results in zero (0). – donPablo Oct 25 '19 at 15:48

1 Answers1

0

In a first attempt I thought the solution was to use SQL statements to return only the DISTINCT rows.

But, by not supporting COLLATION, an SQL sentence in VBA to approximately simulate case sensitive behavior wouldn't be as efficient as I wish.

Said that, the only alternative in VBA(as far as my knowledge goes) is iterating through the dataset.

Try using the following subprocess and tell me how it goes:

Code:

Sub remove_duplicates(ByVal wk_sheet As Worksheet, ByVal rng As Range)

'   +----------------------------------------------------------+
'   | DESCRIPTION:                                             |
'   |   Removes all duplicate whole rows in a range.           |
'   |   Case sensitive.                                        |
'   |                                                          |
'   | VARIABLES:                                               |
'   |   wk_sheet = Worksheet object where our data is stored.  |
'   |   rng = Range object where our data is stored.           |
'   |   arr = array to store the matrix.                       |
'   |   a = variables to store rows for comparison.            |
'   |   b = variables to store rows for comparison.            |
'   |   dirrng = string to store the references of rows        |
'   |            to delete.                                    |
'   |   rngc1 = string storing first column reference of       |
'   |           range.                                         |
'   |   rngc2 = string storing last column reference of        |
'   |           range.                                         |
'   |                                                          |
'   +----------------------------------------------------------+

    Dim arr As Variant, a As Variant, b As Variant
    Dim dirrng As String, rngc1 As String, rngc2 As String

    With rng
        arr = .Value
        rngc1 = Split(Mid(.Cells(1, 1).Address, 2), "$")(0)
        rngc2 = Split(Mid(.Cells(1, .columns.Count).Address, 2), "$")(0)
    End With

    For i = 1 To UBound(arr)
        a = Join(Application.WorksheetFunction.Index(arr, i, 0), "|")
        For r = 1 To UBound(arr)
            If i <> r And _
            (dirrng = "" Or _
             Not InStr(1, dirrng, _
                       rngc1 & i & ":" & rngc2 & i, vbBinaryCompare) > 0) Then
                b = Join(Application.WorksheetFunction.Index(arr, r, 0), "|")
                If a = b Then
                    If Len(dirrng) > 0 Then
                        dirrng = dirrng & "," & rngc1 & r & ":" & rngc2 & r
                    Else
                        dirrng = rngc1 & r & ":" & rngc2 & r
                    End If
                End If
            End If
        Next r
    Next i

    'Deleting all rows at once is more efficient than deleting one at time
    If Len(dirrng) > 0 Then rng.Range(dirrng).Delete Shift:=xlUp

End Sub

Testing:

I made a test with the following dataset:

enter image description here

IMPORTANT: Notice that we have here a named range "TABLE_CONTENT" which contains our datafields. You must adapt your version of the code if you would like to use the entire table as the interaction range and keep the headers.

Then I used the following to call remove_duplicates subprocess, passing the proper parameters:

Sub test()
    Call remove_duplicates(ActiveSheet, ActiveSheet.Range("TABLE_CONTENT"))
End Sub

Result:

enter image description here

Hope it helps.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Ferd
  • 1,273
  • 14
  • 16
  • hi thanks for this in an SQL query these are actually distinct rows due to the joining fields and lets say cost codes, eg an invoice with multiple cost codes for each line, i actually in this case only want to see one instance of that invoice . – Leighholling Oct 29 '19 at 09:26
  • Haven't you solved your question yet? If so, please include a couple of images in your question: one where you show an example of the initial data and another one where the expected final result is displayed. That way I can clarify my interpretation and would have no problem elaborating a code that solves your situation. – Ferd Oct 31 '19 at 11:45
  • Hi no did not solve and went for an SQL solution instead – Leighholling Feb 21 '20 at 11:05