0

I have to massive Excel sheets (rows 7500 and 16000). I need to see what items that are in list one are NOT in list two... and what items are in list two that are NOT in list one, and then paste those results on a third sheet.

I decided to store both lists in two Collections. So far that works well. When I try to loop through the Collections to find what doesn't match my computer freezes as the file is too big.

How can I change my code so that it is quicker? I feel like there must be a better way to do this instead of looping through every i in list one and every z in list two.

Thanks!

    Sub FullListCompareFSvDF()
Worksheets("FundserveFL").Activate
'Open New Collection and define every variable
Dim FSTrades As New Collection
Dim c As Long
Dim i As Long
Dim z As Long
Dim searchFor As String

'enter the items into the list. There are blank rows and so the first IF Statement is to ignore these.
' The Else Statement shows an account number as the item and an account number & balance (FS.Offset(0,6).Value) as the key

Dim FS As Range
 For Each FS In Sheet1.Range("L:L")
    If FS = "" Then
    Else: FSTrades.Add CStr(FS.Value & " " & FS.Offset(0, 6).Value)
    End If
 Next

Worksheets("DatafileFL").Activate
Dim DFTrades As New Collection

'enter the items into the list. There are blank rows as well as random numbers  and so the first IF Statement is to ignore these (all account numbers are greater than 10000
'"Matching" is displayed for all errors - during an error read the account number from two columns over.
' The Else Statement shows an account number as the item and an account number & balance (FS.Offset(0,6).Value) as the key

Dim DF As Range
    For Each DF In Sheet2.Range("H:H")
    If DF = "" Or Not IsNumeric(DF.Offset(0, 2)) Or DF < 10000 Then
    ElseIf DF.Offset(0, -4) = "MATCHING" Then
    DFTrades.Add CStr(DF.Offset(0, 2).Value & " " & DF.Value)
    Else:
    DFTrades.Add CStr(DF.Value & " " & DF.Offset(0, -2).Value)
    End If
Next

'loop through the first collection. Find the first item and try to match it with the items in the second collection.
'Collection 1 Item 1... is it in Collection 2 Item 1? No - then is it in Collection 2 Item 2? When a match is found, move on to Collection 1 Item 2... If no match is found send the item to "ForInvestigation" worksheet

For i = 1 To FSTrades.Count
searchFor = FSTrades(i)
z = 0
    Do
        z = z + 1
        If z > DFTrades.Count Then
        c = c + 1
        Worksheets("ForInvestigation").Activate
        Cells(c, 1).Value = DFTrades(i)
        Exit Do
            Else:
                If DFTrades(z) = searchFor Then
                    Exit Do
                End If
        End If
    Loop
Next

'Clear Collections
Set FSTrades = Nothing
Set DFTrades = Nothing

End Sub
Dirk Reichel
  • 7,989
  • 1
  • 15
  • 31
Jonh
  • 111
  • 13
  • 3
    For one thing, why are you looping over the entire `H:H` range? Won't that go through all million+ rows of column H, most of which are blank? Same with `L:L`. – Marc Feb 25 '16 at 20:11
  • Hi Marc - Yes I am looping over the entire range. I'm not sure how to avoid this. Do you have any ideas? Keep in mind that the list is not constant. It has intermittent blank rows and every time the macro is used the size of the list is different. – Jonh Feb 25 '16 at 21:01
  • 1
    http://stackoverflow.com/questions/11169445/error-in-finding-last-used-cell-in-vba – David Zemens Feb 25 '16 at 21:02
  • Thanks David - I'm reading through the post now. I will implement along side Ron's answer below if possible. – Jonh Feb 25 '16 at 21:14
  • The `End(xlUp)` method that Ron used will likely work as long as there is no other data below these values. – David Zemens Feb 25 '16 at 21:16
  • 1
    @DavidZemens Good point. There could be data lower than the last entry in column `L` in some other column. – Ron Rosenfeld Feb 26 '16 at 18:17

4 Answers4

3
  • Don't Activate
  • Read all the relevant cells into a variant array in one step. eg:

Dim V As Variant
With Worksheets("FundserveFL")
    V = .Range("L1", .Cells(.Rows.Count, "L").End(xlUp)).Resize(columnsize:=6)
End With

  • Create a key for your collection that can be used to see if there is a duplicate.

On Error Resume Next
 For i = 1 To UBound(V, 1)
    If V(i, 1) <> "" Then
        FSTrades.Add Item:=CStr(V(i, 1) & " " & V(i, 6)), Key:=CStr(V(i, 1) & " " & V(i, 6))
    End If
 Next i
 On Error Resume Next

If you similarly handle the data on your second worksheet, creating an array, adding it to the same collection after creating a key which will "error" if you try to add a duplicate, you will wind up with a collection that contains no duplicates. Populate an array with that collection, and write it to your third worksheet.

I would guess that using the above technique will increase your speed by at least a factor of ten, if not more.

EDIT:

If you want to do something other than a unique list, it is merely a matter of understanding the logic. For example, if, as in your comment, you have two arrays 1,2,3,4 and 1,3,4,5, you could do something like the following. Understand, of course, that one assumption is that there are no duplicates within either array: (If there are, that can be handled also, would just require a different logic)


Sub foo()
    Dim V1, V2
    Dim COL As Collection
    Dim I As Long

V1 = Array(1, 2, 3, 4)
V2 = Array(1, 3, 4, 5)

Set COL = New Collection
For I = 0 To UBound(V1)
    COL.Add V1(I), CStr(V1(I))
Next I

On Error Resume Next
For I = 0 To UBound(V2)
    COL.Add V2(I), CStr(V2(I))
    Select Case Err.Number
        Case 457  'This is a duplicate, so will remove
            Err.Clear
            COL.Remove CStr(V2(I))
        Case Is <> 0
            MsgBox "Error No. " & Err.Number & vbTab & Err.Description
    End Select
Next I

Stop

End Sub

When the routine stops, if you examine COL you will see it only contains 2 and 5

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
  • Hi Ron - Thanks for this, it's fantastic! One thing: I'm not looking for a collection that contains no duplicates. I'm looking for a collection that exists in 1 list but that does not exist in the second, and vice versa. It seems that your answer is making a single "master" list. If I have list 1 , 2 , 3 , 4 and list 1, 3, 4, 5. I need a third list that shows 2 and 5. Perhaps there is an "On Error Delete List Item"? – Jonh Feb 25 '16 at 21:49
  • 1
    @Jonh You just manipulate the collection addition routine and errors so as to remove the item if there is a duplicate. See the edit to my post – Ron Rosenfeld Feb 25 '16 at 22:56
  • @Jonh If my response has helped you enough, please mark it as the answer. Thank you. – Ron Rosenfeld Feb 26 '16 at 17:31
  • It was a great answer - it has taught me a lot about the power of arrays. Thank you. – Jonh Feb 26 '16 at 17:41
  • Hi Ron, Sorry after testing I see that it doesn't work. Consider in your above examples the following arrays - V1 = Array(1, 2, 3, 4, 5) V2 = Array(1, 3, 4, 5, 6). It should delete 5 but it doesn't. The reason for this is that your method only works if the duplication in list 2 is at the same I interval. The duplicate in the list may be 100 iterations down from it's match. Is there a good way to test for this? – Jonh Feb 26 '16 at 18:08
  • @Jonh I cannot reproduce the error you claim you have found. When I replace the arrays with `V1 = Array(1, 2, 3, 4, 5)` & `V2 = Array(1, 3, 4, 5, 6)` I find that the collection contains 2 and 6, as I would expect. Are you looking at the `Locals` window? If so, perhaps you are mistaking the Value for `I` as being in the Collection? There is no reason why the values in V2 need to be any particular order relative to V1. – Ron Rosenfeld Feb 26 '16 at 18:22
  • Ah you're right about this. My fault. The final list I am getting when I test with my data doesn't seem to working properly. I'll test it further and try to figure it out. – Jonh Feb 26 '16 at 18:24
  • @Jonh There is one error in the code for the manually created arrays. That loop should start at zero (0) and not at one (1). I will correct that. That is only in the second example, looping through the arrays. `Array` creates a zero-based array in the default situation. But setting a variant array equal to a range will produce a 1-based 2D array. – Ron Rosenfeld Feb 26 '16 at 18:26
  • Ok I see that now also. I think my problem is in defining the array. In the first example you used the .Range function to build the Collection but this doesn't seem to be working in your second example for Arrays. – Jonh Feb 26 '16 at 18:32
  • @Jonh One other point. When creating the arrays, is there data below the last row in the column being tested, perhaps in a different column. For example, if you want to collect data in A:L, and the lowest entry in column A is in Row 100, but the lowest entry in column L is Row 200, you would have to use a different method than what I showed to ensure getting all the data. – Ron Rosenfeld Feb 26 '16 at 18:32
  • @Jonh There are no ranges in the second example. – Ron Rosenfeld Feb 26 '16 at 18:34
  • There is no other data that of importance. There is written information in some bottom cells but I don't need them (random symbols taken from a PDF). It is counting the rows properly in the first example. – Jonh Feb 26 '16 at 18:34
  • Exactly - I'm not sure how to put them in. So your example properly tests Array(1,2,3,4,5). But I need it to test Array(.Range("L1", .Cells(.Rows.Count, "L").End(xlUp)).Resize(columnsize:=6)). Which then doesn't seem to work. I end up wrapping the Array in a With Statement, but still it fails. – Jonh Feb 26 '16 at 18:36
  • @Jonh It won't. The second example was merely to demonstrate how to use collections to create the type of list you wanted. You should either modify the second example to handle a range-originated variant array, or modify the first to create the type of list you want. – Ron Rosenfeld Feb 26 '16 at 18:37
2

I have a similarly sized list of stuff, and I frequently need to create a unique list of values. I'm not sure why you want to work with two collections at once though. It is much simpler to load the data from one sheet into the collection, then loop through the other sheet to see if it already exists in the collection. Here's some of my code to help you write yours.

Dim colUniqueSNs As New Collection
On Error Resume Next
    For r = 2 To Sheets("Inventory").UsedRange.Rows.Count
        strSN = Sheets("Inventory").Cells(r, 6).Text
        strHost = Sheets("Inventory").Cells(r, 2).Text
        If Not InCollection(colUniqueSNs, strSN) Then colUniqueSNs.Add strHost, strSN
    Next
On Error GoTo 0

Public Function InCollection(col As Collection, key As Variant) As Boolean
    Dim obj As Variant
    On Error GoTo err
    InCollection = True
    obj = col(key)
    Exit Function
err:
    InCollection = False
End Function
Tim
  • 2,701
  • 3
  • 26
  • 47
  • Hi Tim - Thanks for this. I'm currently testing out your answer, Ron's answer and Dirk's answer. In your answer, how can you get the collection to be pasted onto a worksheet? I ask because using Ron's answer I'm able to get the collection I need but now I don't know how to copy it onto a worksheet. – Jonh Feb 26 '16 at 15:57
  • 1
    Use a `For` loop. E.G. `For x=1 to colUniqueSNs.Count` (next line) `Sheet1.Cells(x+1,1).Value = colUniqueSNs.Item(x)` (next line) `Next` – Tim Feb 26 '16 at 16:51
  • Ah ok that seems simple enough. I didn't realize I could send the item into the cell like that. – Jonh Feb 26 '16 at 17:46
  • 1
    `Item(x)` has many properties, but value is its default property. _Technicaly_ I should have written `Item(x).Value` but meh. Ron's answer is good too. We're basically doing the same thing. Be advised that if you run into memory issues or if it is still very slow you may want forego setting that range variant. Variants are slow data types and memory hogs to boot, but they are very useful when you don't know (or care to know) the data types you're working with. – Tim Feb 26 '16 at 18:45
1

You are starting with ranges and you are ending with them. How about skipping the Collections at all?

Pls try this:

Sub FullListCompareFSvDF()

  Dim Ran1Val As Variant, Ran1ValOffset As Variant, Ran2Val As Variant
  Ran1Val = Intersect(Sheet1.Columns(12), Sheet1.UsedRange).Value
  Ran2Val = Intersect(Sheet1.Columns(18), Sheet1.UsedRange).Value

  Dim i As Long, j As Long
  For i = 1 To UBound(ranval1)
    If Len(Ran1Val(i, 1)) Then Ran1Val(i, 1) = Ran1Val(i, 1) & " " & Ran2Val(i, 1)
  Next

  Ran2Val = Intersect(Sheet2.Range("D:J"), Sheet2.UsedRange).Value
  Dim OutputVal() As Variant
  ReDim OutputVal(1 To UBound(Ran1Val) + UBound(Ran2Val), 1 To 1)

  For i = 1 To UBound(Ran2Val)
    If Ran2Val(i, 5) <> "" And IsNumeric(Ran2Val(i, 7)) And Ran2Val(i, 5) > 10000 Then
      If Ran2Val(i, 1) = "MATCHING" Then
        Ran2Val(i, 1) = CStr(Ran2Val(i, 7) & " " & Ran2Val(i, 5))
      Else
        Ran2Val(i, 1) = CStr(Ran2Val(i, 5) & " " & Ran2Val(i, 3))
      End If

      If IsNumeric(Application.Match(Ran2Val(i, 1), Ran1Val, 0)) Then
        j = j + 1
        OutputVal(j, 1) = Ran2Val(i, 1)
      End If

    Else
      Ran2Val(i, 1) = ""
    End If
  Next

  ReDim Preserve Ran2Val(1 To UBound(Ran2Val), 1 To 1)

  Dim runNer As Variant
  For Each runNer In Ran1Val
    If Len(runNer) Then
      If IsNumeric(Application.Match(runNer, Ran2Val, 0)) Then
        j = j + 1
        OutputVal(j, 1) = runNer
      End If
    End If
  Next

  If j > 0 Then
    Worksheets("ForInvestigation").Range("A1:A" & j).Value = OutputVal
  End If

End Sub

I simply gets the Range.Value inside an array. Deleting all unused values and having one dimension as (1 To 1) allowes us to use Application.Match which is one of the fastest functions in excel.

when building up the second array, we already can check for the first one and push uniques directly to the output-array.
resizing the second array (with preserve) allowes us to use this with Match too.

Finally checking the entrys of the first array against the second one and push them also inside our output-array.

now we can directly copy the values to your destination (in one step)

Note:
- You may delete the "output-range" first (a smaller list later on will not overwrite oler values.)
- I'm not able to run real checks (you may need to report errors via comment I missed out)
- this code does not check for doubles inside one list (having 1 item 2 times in list 1 but not in list 2, will print it 2 times at the end / if you need this check, then just write a comment)

Dirk Reichel
  • 7,989
  • 1
  • 15
  • 31
  • Hi Dirk - This is a really cool way of running the code. I've never used the Application.Match function before. With that said, to be honest, it still takes a long time to run. Also it spits out a list without any duplicates but I'm looking for the items that exists in 1 list but that do not exist in the second, and vice versa. It seems that your answer is making a single "master" list. If I have list 1 , 2 , 3 , 4 and list 1, 3, 4, 5. I need a third list that shows 2 and 5. I'll keep tinkering with it. I'm also looking at Ron's which seems to work exceptionally fast. – Jonh Feb 26 '16 at 16:22
  • I'm confused... it should do exactly that. `OutputVal` should only hold these items which are only at one of the lists... Also it was a try to skip the collections. Would need some sample data to optimize it, but having a pretty good solution from Ron there is no real need for that (to my eye)... and also I'm to lazy for that right now :P – Dirk Reichel Feb 27 '16 at 12:47
0

Thanks for all of your help! Here is my answer. It is mostly coming from Ron's answer - I have of course added some tweaks to it.

Sub MatchFSTradesDFTrades2()

Dim V1 As Variant
Dim V2 As Variant
Dim COL As New Collection
Dim I As Long

Worksheets("DatafileFL").Activate

With Worksheets("FundserveFL")
    V1 = .Range("L1", .Cells(.Rows.Count, "L").End(xlUp)).Resize(columnsize:=7)
End With

With Worksheets("DatafileFL")
    V2 = .Range("F1", .Cells(.Rows.Count, "D").End(xlUp)).Resize(columnsize:=12)
End With

For I = 1 To UBound(V1)
    If V1(I, 1) = " " Or Not IsNumeric(V1(I, 1)) Or V1(I, 1) < 10000 Or V1(I, 1) = "***" Or Not IsNumeric(V1(I, 3)) Or (V1(I, 5)) = "Buy-EC" Or (V1(I, 5)) = "Sell-EC" Then
    Else:
        COL.Add (V1(I, 1)) & " " & (V1(I, 7)), CStr(V1(I, 1)) & " " & (V1(I, 7))
    End If
Next I
For I = 1 To COL.Count
    Sheet3.Cells(I + 1, 1).Value = COL.Item(I)
Next
On Error Resume Next
For I = 1 To UBound(V2)
    If V2(I, 1) = "MATCHING" Then
        If IsNumeric(V2(I, 5)) Then
            COL.Add (V2(I, 7)) & " " & V2(I, 5), CStr(V2(I, 7)) & " " & V2(I, 5)
                Select Case Err.Number
                    Case 457  'This is a duplicate, so will remove

                        Err.Clear
                        COL.Remove CStr(V2(I, 7)) & " " & V2(I, 5)
                End Select
        Else: V2(I, 12) = Right(V2(I, 5), Len(V2(I, 5)) - 1)
              V2(I, 12) = Format(V2(I, 12), "General Number")
            COL.Add (V2(I, 7)) & " " & V2(I, 12), CStr(V2(I, 7)) & " " & V2(I, 12)
                Select Case Err.Number
                    Case 457  'This is a duplicate, so will remove
                        Err.Clear
                        COL.Remove CStr(V2(I, 7)) & " " & V2(I, 12)
                End Select
        End If
    ElseIf V2(I, 5) = " " Or Not IsNumeric(V2(I, 5)) Or V2(I, 5) < 10000 Or V2(I, 5) = "***" Or V2(I, 1) = "BULK" Then
    Else:
        If IsNumeric(V2(I, 3)) Then
            COL.Add (V2(I, 5)) & " " & V2(I, 3), CStr(V2(I, 5)) & " " & V2(I, 3)
                Select Case Err.Number
                    Case 457  'This is a duplicate, so will remove
                        Err.Clear
                        COL.Remove CStr(V2(I, 5)) & " " & V2(I, 3)
                End Select
        Else: V2(I, 12) = Right(V2(I, 3), Len(V2(I, 3)) - 1)
              V2(I, 12) = Format(V2(I, 12), "General Number")
            COL.Add (V2(I, 5)) & " " & V2(I, 12), CStr(V2(I, 5)) & " " & V2(I, 12)
                Select Case Err.Number
                    Case 457  'This is a duplicate, so will remove
                        Err.Clear
                        COL.Remove CStr(V2(I, 5)) & " " & V2(I, 12)
                End Select
        End If
    End If
Next

Worksheets("ForInvestigation").Activate
Cells.Clear

For I = 1 To COL.Count
    Sheet3.Cells(I + 1, 1).Value = COL.Item(I)
Next

Range("A:A").Select
Selection.TextToColumns DataType:=xlDelimited, Space:=True, Other:=True
Range("A1") = "Trade ID Number"
Range("A1").Font.Bold = True
Range("B1") = "Net Balanace On Trade"
Range("B1").Font.Bold = True
End Sub
Jonh
  • 111
  • 13