1

I'm working with a rather large dataset (>100,000 rows) and trying to compare two lists to figure out which items in the new list are not already in the master list. In other words I want to find the new unique items.

I have some VBA code that uses vlookup and arrays that works, but bombs out when the arrays get too big (~70,000). So I've turned to collections. However I'm having difficulty searching the collections using vlookup or match.

Sub find_uniqueIDs()

Dim a As Long
Dim n As Long
Dim m As Variant
Dim oldnum As Long
Dim oldIDs As Variant
Dim oldcoll As New Collection
Dim newnum As Long
Dim newIDs As Variant
Dim newcoll As New Collection

oldnum = 75000
oldIDs = Range("A1", Range("A" & oldnum))
newnum = 45000 + 3
newIDs = Range("G3", Range("G" & newnum))

'Using arrays to search, but bombs out when oldnum or newnum are ~70000
For n = 1 To newnum - 3
    m = Application.VLookup(newIDs(n, 1), oldIDs, 1, False)
    If IsError(m) Then Range("E100000").End(xlUp).Offset(1, 0) = newIDs(n, 1)
Next n

'Using collections to search
For n = 1 To oldnum
On Error Resume Next
    oldcoll.Add oldIDs(n, 1)
On Error GoTo 0
Next n

For m = 1 To newnum
On Error Resume Next
    newcoll.Add newIDs(m, 1)
On Error GoTo 0
Next m

'This bit of code doesn't work
For a = 1 To newcoll.Count
If Application.VLookup(newcoll(a), oldcoll, 1, False) = "#N/A" Then _
    Range("E100000").End(xlUp).Offset(1, 0) = newcoll(a)
Next a

End Sub

Any ideas how I can determine whether a particular item is in the master list using collections?

pnuts
  • 58,317
  • 11
  • 87
  • 139
lexkel
  • 29
  • 5
  • 1
    Have you considered a Scripting.Dictionary object with its own primary unique index on the **key**? Also, if all you want is to prove existance, why not `Application.Match` instead of VLOOKUP? –  Nov 11 '15 at 13:07
  • A scripting disctionnary is indeed probably the best solution. For formulae, ´COUNTIF()´ is a zillion times faster than ´VLOOKUP()´ – iDevlop Nov 11 '15 at 13:14
  • Use keys when you construct the collection: http://stackoverflow.com/a/21095201/4604845 – Vegard Nov 11 '15 at 13:33
  • 1
    @iDevlop - The [MATCH function](https://support.office.com/en-us/article/match-function-0600e189-9f3c-4e4f-98c1-943a0eb427ca) is even faster (see [Is there a faster CountIF](http://stackoverflow.com/questions/29972016/is-there-a-faster-countif/29983885?s=1|7.5670#29983885)) –  Nov 11 '15 at 13:51
  • You could try to add the A to the B collection, if there is no error its A is not present in B, if there is an error A is present in B. However, i use dictionaries rather than collections. – Nathan_Sav Nov 11 '15 at 14:23
  • 1
    And what about querying your Excel sheets using SQL ? http://stackoverflow.com/a/8766541/78522 – iDevlop Nov 11 '15 at 14:57
  • To expand on some of the above, use a `Key` when you create the collection; trying to add a non-unique item will return a `457` error which you can test for. – Ron Rosenfeld Nov 11 '15 at 15:04

4 Answers4

1

Here is a short sub demonstrating some of the scripting dictionary methods.

Sub list_New_Unique()
    Dim dMASTER As Object, dNEW As Object, k As Variant
    Dim v As Long, vVALs() As Variant, vNEWs() As Variant

    Debug.Print "Start: " & Timer

    Set dMASTER = CreateObject("Scripting.Dictionary")
    Set dNEW = CreateObject("Scripting.Dictionary")
    dMASTER.comparemode = vbTextCompare
    dNEW.comparemode = vbTextCompare

    With Worksheets("Sheet7")
        vVALs = .Range("A2:A100000").Value2
        vNEWs = .Range("C2:C100000").Value2
    End With

    'populate the dMASTER values
    For v = LBound(vVALs, 1) To UBound(vVALs, 1)
        dMASTER.Add Key:=vVALs(v, 1), Item:=vVALs(v, 1)
    Next v

    'only populate dNEW with items not found in dMASTER
    For v = LBound(vNEWs, 1) To UBound(vNEWs, 1)
        If Not dMASTER.exists(vNEWs(v, 1)) Then
            If Not dNEW.exists(vNEWs(v, 1)) Then _
                dNEW.Add Key:=vNEWs(v, 1), Item:=vNEWs(v, 1)
        End If
    Next v

    Debug.Print dNEW.Count

    For Each k In dNEW.keys
        'Debug.Print k
    Next k

    Debug.Print "End: " & Timer

    dNEW.RemoveAll: Set dNEW = Nothing
    dMASTER.RemoveAll: Set dMASTER = Nothing
End Sub

With 99,999 unique entries in A2:A100000 and 89747 random entries in C2:C89747, this found 70,087 unique new entries not found in A2:A100000 in 9.87 seconds.

0

VLookup is a worksheet function, not a regular VBA function, thus it's for searching in Ranges, not Collections.

Syntax: VLOOKUP (lookup_value, table_array, col_index_num, [range_lookup])

[...]

table_array (required): the range of cells in which the VLOOKUP will search for the lookup_value and the return value.

In order to search in other VBA data structures like arrays, collections etc you'll have to figure out some other way and maybe implement it manually.

Community
  • 1
  • 1
André Chalella
  • 13,788
  • 10
  • 54
  • 62
0

I would do it like this:

Sub test()

Dim newRow As Long, oldRow As Long
Dim x As Long, Dim y As Long
Dim checker As Boolean

With ActiveSheet

newRow = .Cells(.Rows.Count,7).End(xlUp).Row
oldRow = .Cells(.Rows.Count,1).End(xlUp).Row
checker = True

for y = 1 To oldRow

    for x = 1 To newRow

    If .Cells(y,1).Value = .Cells(x,7).Value Then

    checker = False

    Exit For

    End If

    Next

If checker Then

Range("E10000").End(xlUp).Offset(1,0).Value = .Cells(y,1).Value

End If

checker = True

Next

End With

End Sub
MGP
  • 2,480
  • 1
  • 18
  • 31
0

While @Jeeped suggestion of a Scripting.Dictionary object might be the best one, you could also try using the Filter() function applied to your array.

Community
  • 1
  • 1
iDevlop
  • 24,841
  • 11
  • 90
  • 149