2

I have some trouble deleting duplicate rows, since the way I have to do it is a kind of hard. Let me explain.

This is what I have (actually I have more than 90,000 rows!)

+-----------+------------------+
|    Ref    |       Sup        |
+-----------+------------------+
| 10000-001 | S_LA_LLZ_INOR    |
| 10000-001 | S_LA_RADAR_STNFN |
| 10000-001 | S_LA_VOR_LRO     |
| 10000-001 | S_LA_DME_LRO     |
| 10000-001 | S_LA_DME_INOR    |
| 1000-001  | S_LA_GP_INOR     |
| 1000-001  | S_LA_LLZ_ITF     |
| 1000-001  | S_ZS_LLZ_ITF     |
| 1000-002  | S_LA_GP_INOR     |
| 1000-002  | S_LA_LLZ_ITF     |
+-----------+------------------+

What I have to do is search in column A for duplicates. Then I have to check in column B if the chain of characters after S_LA_ or S_ZS_ are the same. If they are the same. I have to delete the row with the S_LA_

So, in the rows above I would have to delete the row with 1000-001|S_LA_LLZ_ITF.

I have written a code. It works, but it's painfully slow when working with 10,000+ rows.

Dim LastRowcheck As Long
Dim str1 As String
Dim str2 As String
Dim str3 As String
Dim str4 As String
Dim str5 As String
Dim str6 As String
Dim prueba As Integer
Dim prueba1 As Integer
Dim n1 As Long
Dim n3 As Long
Dim colNum As Integer
Dim colNum1 As Integer
Dim iCntr As Long

colNum = WorksheetFunction.Match("Ref", ActiveSheet.Range("1:1"), 0)
colNum1 = WorksheetFunction.Match("Sup",ActiveSheet.Range("1:1"), 0)

With ActiveSheet
  LastRowcheck = .Range("A" & .Rows.Count).End(xlUp).Row
  For n1 = 2 To LastRowcheck
      str1 = Cells(n1, colNum).Value
      For n3 = n1 + 1 To LastRowcheck + 1
          str2 = Cells(n3, colNum).Value
          prueba = StrComp(num1, num2)
          If prueba = 0 Then
              str3 = Cells(n1, colNum1).Value
              str4 = Cells(n3, colNum1).Value
              str5 = Right(str3, Len(str3) - 5)
              str6 = Right(str4, Len(str4) - 5)
              prueba1 = StrComp(str5, str6)
                  If prueba1 = 0 Then
                      If StrComp(num3, num4) = 1 Then
                          Cells(n3, colNum).Interior.ColorIndex = 3
                      ElseIf StrComp(num3, num4) = -1 Then
                          Cells(n1, colNum).Interior.ColorIndex = 3
                      End If
                  End If
              End If
          Next n3
      Next n1

  For iCntr = LastRowcheck To 2 Step -1
      If Cells(iCntr, colNum).Interior.ColorIndex = 3 Then
          Rows(iCntr).Delete
      End If
  Next iCntr
End With

I would appreciate any help or guidance you could give me.

kaybee99
  • 4,566
  • 2
  • 32
  • 42
  • How can I use the remove duplicates in the way described above? I can't control which duplicates to delete (or at least I don't know how) – Francisco Rodríguez González Jun 24 '15 at 12:52
  • 1
    @Raystafarian Because he is only comparing the last part of two different strings. OP I would read up on array's, accessing the sheet is a very costly thing in terms of CPU time, an array would cut your time dramatically. e.g. read sheet range into array - process - clear sheet - read array back to sheet – 99moorem Jun 24 '15 at 12:52
  • That sounds promising, but how could I do it? – Francisco Rodríguez González Jun 24 '15 at 12:53
  • @99moorem I missed that. Perhaps text to columns the second column and remove the prefixes then highlight duplicates, move and remove and then recompile – Raystafarian Jun 24 '15 at 12:56
  • A couple of suggestions: definitely use arrays instead of the range, like @99moorem suggested. Secondly, start by sorting the first column, then copy entire range to a variable array and keep looping over the same values in column 1 - when val in col 1 changes you're done with that particular set of duplicates and the next one begins – paul bica Jun 24 '15 at 13:00
  • @paulbica I'm a newbie with VBA. Could you help me with some code? Thank you – Francisco Rodríguez González Jun 24 '15 at 13:13

2 Answers2

0

A non-VBA solution: Insert a new column C Assuming that the data starts in row 1, in C1 enter:

=CONCATENATE(A1,MID(B1,5,LEN(B1)-4))

Copy the formula down column C. Then use the remove duplicates feaure keyed to column C.

John Coleman
  • 51,337
  • 7
  • 54
  • 119
  • Wow! This works, but I don't know if it deletes the duplicate with the S_ZS_ or the one with the S_LA_ – Francisco Rodríguez González Jun 24 '15 at 13:10
  • Sort the data in descending order according to column B. This will put all of the S_ZS_ above all of the S_LA_. Then when you remove the duplicates the first ones (The S_ZS_) will be retained. If you need the original ordering you can re-sort (if the original order is somewhat ad hoc you can create a new column with the numbers 1,2,3, in order and then re-sort according to that column). This assumes that there is at most 1 S_ZS_ that you want to retain (for each duplicate). If there is more than 1 -- you would need to use VBA after all (I think). – John Coleman Jun 24 '15 at 14:11
0

I believe this is nearly there - MAKE SURE TO TAKE A BACKUP OF YOUR DATA BEFORE RUNNING asthis will overwrite the data

Sub test()
Dim IN_arr()
Dim OUT_arr()

IN_arr = ActiveSheet.UsedRange.Value2
Count = 1
ReDim OUT_arr(UBound(IN_arr, 2) - 1, Count)
Found = 1

For i = 1 To UBound(IN_arr, 1)
    Found = 1
    For c = 1 To UBound(IN_arr, 1)
        Comp1 = Right(IN_arr(i, 2), Len(IN_arr(i, 2)) - InStr(1, IN_arr(i, 2), "S_LA") - 3) 'Compare last section
        Comp2 = Right(IN_arr(c, 2), Len(IN_arr(c, 2)) - InStr(1, IN_arr(c, 2), "S_ZS") - 3)

        Comp3 = IN_arr(i, 1) 'Compare first section
        Comp4 = IN_arr(c, 1)

        If Comp1 = Comp2 And i <> c And Comp3 = Comp4 Then
            Found = 0
        End If
    Next
    If Found = 0 Then
        'do not keep row
    Else
        'keep row
        If OUT_arr(UBound(IN_arr, 2) - 1, Count - 1) <> "" Then
            Count = Count + 1
            ReDim Preserve OUT_arr(UBound(IN_arr, 2) - 1, Count)
        End If

        For cols = 0 To UBound(IN_arr, 2) - 1
            OUT_arr(cols, Count - 1) = IN_arr(i, cols + 1)
        Next


    End If
Next

ActiveSheet.UsedRange.ClearContents
ActiveSheet.Range("A1").Resize(Count, UBound(OUT_arr, 1) + 1).Value = Application.Transpose(OUT_arr)

End Sub

Please note made some small changes to code

99moorem
  • 1,955
  • 1
  • 15
  • 27