Quickly remove duplicate strings from VBA Collection

Question

Dim MyCollection As VBA.Collection
Set MyCollection = New VBA.Collection

For i = MyCollection.Count To 1 Step -1
    For j = (i - 1) To 1 Step -1
        If MyCollection(i) = MyCollection(j) Then
            MyCollection.Remove i
            Exit For
        End If
    Next j
Next i

There are currently 8000 e-mail entries in MyCollection. I would like to remove redundant elements from this large list, but faster.

How can I accelerate this code?

8000 thousand? So **8 million**? Your question is hard to understand, but the best way to avoid duplicates is to use a dictionary object instead of a collection when initially storing the data. What kind of data is stored and how long does your method take for the 8 million records? — ashleedawg, Jun 12 '18 at 07:02
Possible duplicate of [VBA: Removing duplicates from a Collection](https://stackoverflow.com/questions/39355335/vba-removing-duplicates-from-a-collection) — ashleedawg, Jun 12 '18 at 07:10
Not 8 million. It's a typo. Made edits to my question. The gist of the question is that the list is large which creates a lag, hence would like to accelerate the process. Is there a better code for it? — Barok, Jun 12 '18 at 08:39

score 0 · Answer 1 · answered Jun 12 '18 at 07:19

0

I have advice, it's quite simple and must cases it must be effectiver.

Of cause with some "buts" and main "but":

you must have enough memory
delete count must be high enough

Advice: don't delete element by element

Instead of delete: create new collection without unneeded records.

After that: drop old collection as whole.

answered Jun 12 '18 at 07:19

Alex Yu

3,412
1
25
38

Everything depends on ratios: the bigger your delete count the more effective will be recreation of list. Obviously you will have less inserts and instead of deletes you will do **nothing**. On the other hand: if you delete laughable count of rows - than delete is not a problem at all, problem is filtering. – Alex Yu Jun 12 '18 at 08:54
Very kind of you for providing an idea. I am not sure of the meaning of "creating a new collection without unneeded records." My code actually tries to do exactly that! How about this: divide the list into many smaller ones and remove redundancies for each list. Then merge the lists into one big list, and remove redundancies for this merged list. Would this be faster? – Barok Jun 12 '18 at 08:56
And about dividing into smaller and smaller datasets: modern RAM subsystems don't like a lot small operations, much better when you combine many little operations into thick batches. – Alex Yu Jun 12 '18 at 08:59

Quickly remove duplicate strings from VBA Collection

1 Answers1

Advice: don't delete element by element