40

Is there a faster way to do this?

Set data = ws.UsedRange

Set unique = CreateObject("Scripting.Dictionary")

On Error Resume Next
For x = 1 To data.Rows.Count
    unique.Add data(x, some_column_number).Value, 1
Next x
On Error GoTo 0

At this point unique.keys gets what I need, but the loop itself seems to be very slow for files that have tens of thousands of records (whereas this wouldn't be a problem at all in a language like Python or C++ especially).

AJJ
  • 2,004
  • 4
  • 28
  • 42

5 Answers5

42

Use Excel's AdvancedFilter function to do this.

Using Excels inbuilt C++ is the fastest way with smaller datasets, using the dictionary is faster for larger datasets. For example:

Copy values in Column A and insert the unique values in column B:

Range("A1:A6").AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Range("B1"), Unique:=True

It works with multiple columns too:

Range("A1:B4").AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Range("D1:E1"), Unique:=True

Be careful with multiple columns as it doesn't always work as expected. In those cases I resort to removing duplicates which works by choosing a selection of columns to base uniqueness. Ref: MSDN - Find and remove duplicates

enter image description here

Here I remove duplicate columns based on the third column:

Range("A1:C4").RemoveDuplicates Columns:=3, Header:=xlNo

Here I remove duplicate columns based on the second and third column:

Range("A1:C4").RemoveDuplicates Columns:=Array(2, 3), Header:=xlNo
Jeremy Thompson
  • 61,933
  • 36
  • 195
  • 321
  • 1
    2 issues. #1 - this is pasting data into sheet, as opposed to saving it in a VBA variable. #2 - It is looking at formulas instead of actual values (for me, instead of pasting unique values in the column, it only pasted one common formula into one cell). – ZygD Jun 19 '18 at 17:03
  • @ZygD 1. The Range is a variable, you use it in VBA. 2. You're doing it wrong, use Paste As Value instead of doing it on formula's. – Jeremy Thompson Jun 19 '18 at 22:41
  • 1
    1. Even though technically speaking Range is a variable... but the thing is, you cannot use `AdvancedFilter` method to put data to only a "VBA-visible" variable like array or dictionary (i.e. without "physical" presence in the worksheet). 2. I cannot find how to do it, does this `AdvancedFilter` method really provide the option to paste as values? – ZygD Jun 20 '18 at 08:37
  • Issue #3 - this method removes the original filter in the sheet, if it existed. – ZygD Jun 20 '18 at 09:06
  • #1 you can do the filter and put the results in an Array or Collection, those types don't have inbuilt methods to filter unique, #2 you paste values in the sheet AND then apply the filter #3 that's by design. ps I'm not here to discuss using Excel, maybe ask on SuperUser. – Jeremy Thompson Jun 21 '18 at 07:39
  • 3
    `AdvancedFilter` is not the fastest way. On large data sets, using a dictionary will outperform and by far `AdvancedFilter` (~500ms vs ~60sec for 100k cells). – Florent B. Jul 03 '19 at 15:31
  • i have 20k entries, is this appropriate for the filter? – Timo May 08 '21 at 19:47
  • I want to get unique values of a column and present it to the user in a dropdown. How would I start, `CopyToRange` as a cache for the values and then show them in the dropdown? – Timo May 09 '21 at 12:52
32

Loading the values in an array would be much faster:

Dim data(), dict As Object, r As Long
Set dict = CreateObject("Scripting.Dictionary")

data = ActiveSheet.UsedRange.Columns(1).Value

For r = 1 To UBound(data)
    dict(data(r, some_column_number)) = Empty
Next

data = WorksheetFunction.Transpose(dict.keys())

You should also consider early binding for the Scripting.Dictionary:

Dim dict As New Scripting.Dictionary  ' requires `Microsoft Scripting Runtime` '

Note that using a dictionary is way faster than Range.AdvancedFilter on large data sets.

As a bonus, here's a procedure similare to Range.RemoveDuplicates to remove duplicates from a 2D array:

Public Sub RemoveDuplicates(data, ParamArray columns())
    Dim ret(), indexes(), ids(), r As Long, c As Long
    Dim dict As New Scripting.Dictionary  ' requires `Microsoft Scripting Runtime` '

    If VarType(data) And vbArray Then Else Err.Raise 5, , "Argument data is not an array"

    ReDim ids(LBound(columns) To UBound(columns))

    For r = LBound(data) To UBound(data)         ' each row '
        For c = LBound(columns) To UBound(columns)   ' each column '
            ids(c) = data(r, columns(c))                ' build id for the row
        Next
        dict(Join$(ids, ChrW(-1))) = r  ' associate the row index to the id '
    Next

    indexes = dict.Items()
    ReDim ret(LBound(data) To LBound(data) + dict.Count - 1, LBound(data, 2) To UBound(data, 2))

    For c = LBound(ret, 2) To UBound(ret, 2)  ' each column '
        For r = LBound(ret) To UBound(ret)      ' each row / unique id '
            ret(r, c) = data(indexes(r - 1), c)   ' copy the value at index '
        Next
    Next

    data = ret
End Sub
Florent B.
  • 41,537
  • 7
  • 86
  • 101
  • 1
    You need to add the reference "Microsoft Scripting Runtime" – Florent B. Mar 16 '16 at 19:08
  • I have it added already. It can't seem to find ".Dictionary" of "Scripting" – AJJ Mar 16 '16 at 19:09
  • Doesn't seem to matter though, it runs in a blink of an eye even with late binding. Why is that code so much faster than what I have? – AJJ Mar 16 '16 at 19:10
  • Reading cell by cell is slow with excel. It's faster to load the data in an array and write them back if necessary. – Florent B. Mar 16 '16 at 19:14
  • The only problem you'll encounter with the dictionary approach is that the cell's value will be taken into the dictionary. That's fine for text; any numbers will lose currency or thousand separators and dates will convert to a value (the date/time value right now is 42445.92) – NeepNeepNeep Mar 16 '16 at 22:04
  • @NeepNeepNeep this is another reason why the orthodox method I show is best practice. – Jeremy Thompson Mar 16 '16 at 23:31
  • How can reach values that saved in `unique` variable? – mgae2m Oct 05 '17 at 10:42
  • 1
    @MGae2M, use `.Keys()` on the dictionary to get the unique values in an array. – Florent B. Oct 05 '17 at 10:45
  • Any suggestion how I can convert to the unique values in a string separated by comma. I am new to array and not quite able to figure it out how to achieve this. – Chito Apr 03 '18 at 21:08
  • @Chito, ask/post a question. – Florent B. Apr 03 '18 at 21:12
  • @FlorentB. I have the below code, not getting any error while running but how I can get the unique values in a string separated each value by comma? `Sub test() Dim data() Dim unique As Variant Dim r As Long data = ThisWorkbook.Sheets("LT ID-IP-PR").UsedRange.Value Set unique = CreateObject("Scripting.Dictionary") For r = 1 To UBound(data) unique(data(r, 1)) = Empty Next r End Sub` – Chito Apr 03 '18 at 21:33
  • @FlorentB. Thanks for code! Can you share how you would call RemoveDuplicates() from another sub? This would be a big help to me. If I was calling this sub, would DATA be an empty array, or the entire range of data? Are the columns empty ie, 1, 2 or are they columns with data in them? Thanks! – AmericanCities Oct 30 '19 at 13:57
  • @AmericanCities, the sub removes the duplicated rows in the provided 2D array. columns is the indexes of the columns that contain the duplicate information. It needs to be provided and can't be empty. For example: `data = Range("A1:B10").Value`, `RemoveDuplicates data, 1, 2`. – Florent B. Oct 30 '19 at 14:51
  • @FlorentB. Thanks a ton! Confirms I was calling it correctly. My issue was that I was using Union(rng1,rng2).value for data and not contiguous blocks. When I re-ordered my columns together and created the range, your code worked and performed beautifully! This will speed along my program, so many thanks! – AmericanCities Oct 30 '19 at 16:29
  • @FlorentB. thank you so much for putting this together, it really helped me out! upvote+ – Leo Gurdian Sep 23 '20 at 03:45
  • I think some small updates may be needed to the first block of code at the start of this post to make it work, and make it more robust while I'm at it: ``` lang-vb Dim data(), dict As Object, r As Long Set dict = CreateObject("Scripting.Dictionary") data = ActiveSheet.UsedRange.Columns(1).Value For r = 1 To UBound(data) dict(data(r, some_column_number)) = Empty Next data = WorksheetFunction.Transpose(dict.keys()) ``` – aamailhot Jan 26 '21 at 18:58
  • @FlorentB: Sorry about my previous comment here; please ignore (still learning to format, and didn't see a way to 'preview' my comment). Anyway--I was saying that I think some small updates may be needed to the first block of code at the start of this post to make it work, and make it more robust while I'm at it: **1)** (required?): `ActiveSheet.UsedRange.Columns(**some_column_number**).Value`. **2)** (required?): `dict(data(r, **1**)) = Empty`. **3)** `(optional?) For r = **LBound(data)** To UBound(data)`. (Note '**'s are only for emphasis on what changed; not literal) – aamailhot Jan 26 '21 at 19:10
12

PowerShell is a very powerful and efficient tool. This is cheating a little, but shelling PowerShell via VBA opens up lots of options

The bulk of the code below is simply to save the current sheet as a csv file. The output is another csv file with just the unique values

Sub AnotherWay()
Dim strPath As String
Dim strPath2 As String

Application.DisplayAlerts = False
strPath = "C:\Temp\test.csv"
strPath2 = "C:\Temp\testout.csv"
ActiveWorkbook.SaveAs strPath, xlCSV
x = Shell("powershell.exe $csv = import-csv -Path """ & strPath & """ -Header A | Select-Object -Unique A | Export-Csv """ & strPath2 & """ -NoTypeInformation", 0)
Application.DisplayAlerts = True

End Sub
brettdj
  • 54,857
  • 16
  • 114
  • 177
10

it's funny because i've had to read these instructions over and over again, but it think i worked out a much faster way to do this:

Set data = ws.UsedRange
dim unique as variant
unique = WorksheetFunction.Unique(data)

And then you can do whatever you want with the unique array such as iterating it:

For i = LBound(unique) To UBound(unique)
    Range("Q" & i) = indexes(i, 1)
Next
Maxim Karpyn
  • 109
  • 1
  • 2
1

Try this

Option Explicit

Sub UniqueValues()
Dim ws As Worksheet
Dim uniqueRng As Range
Dim myCol As Long

myCol = 5 '<== set it as per your needs
Set ws = ThisWorkbook.Worksheets("unique") '<== set it as per your needs

Set uniqueRng = GetUniqueValues(ws, myCol)

End Sub


Function GetUniqueValues(ws As Worksheet, col As Long) As Range
Dim firstRow As Long

With ws
    .Columns(col).RemoveDuplicates Columns:=Array(1), header:=xlNo

    firstRow = 1
    If IsEmpty(.Cells(1, col)) Then firstRow = .Cells(1, col).End(xlDown).row

    Set GetUniqueValues = Range(.Cells(firstRow, col), .Cells(.Rows.Count, col).End(xlUp))
End With

End Function

it should be quite fast and without the drawback NeepNeepNeep told about

user3598756
  • 28,893
  • 4
  • 18
  • 28