Optimal way to find number of unique numbers in an array

Question

What is the optimal way to find number of unique numbers in an array. One way is to add them to HashSet and then find the size of hashset. Is there any other way better than this.

I just need the number of unique numbers. Their frequency is not required.

Any help is appreciated.

Thanks, Harish

I meant optimal interms of memory & computation time used by the program. In HashSet approach, for each Add operation may involve hashing and other operations. I want to check if there is any where instead of storing the data, can I used any XOR or OR operator combinations or another way to get the count of unique numbers — Harish, Jun 07 '11 at 18:01

score 3 · Answer 1 · answered Jun 07 '11 at 16:05

3

What's the tradeoff in memory for fewer cpu cycles you're willing to accept? Which is more important for your optimal solution? A variant of counting sort is very inefficient in space, but extremely fast.

For larger datasets you'll be wanting to use hashing, which is what hashset already does. Assuming you're willing to take the overhead of it actually storing the data, just go with your idea. It has the added advantage of being simpler to implement in any language with a decent standard library.

answered Jun 07 '11 at 16:05

Sysyphus

1,061
5
10

What does "optimal" mean for this question? – Brian Stinar Jun 07 '11 at 16:09
I'd assume shortest average or worst case runtime since this is under algorithms, but practicality generally comes in at some point. – Sysyphus Jun 07 '11 at 16:19
I meant optimal interms of memory & computation time used by the program. In HashSet approach, for each Add operation may involve hashing and other operations. I want to check if there is any where instead of storing the data, can I used any XOR or OR operator combinations or another way to get the count of unique numbers – Harish Jun 07 '11 at 18:01
Basically you're implementing a hashset, except instead of storing the value, you just store if it's been used or not. It's slightly more optimal in terms of memory (and maybe computation time) but it's less optimal in ease of coding/design. – Sysyphus Jun 08 '11 at 00:29

score 2 · Accepted Answer · answered Jun 07 '11 at 20:30

2

You don't say what is known about the numbers, but if 1) they are integers and 2) you know the range (max and min) and 3) the range isn't too large, then you can allocate an array of ints equal in length to ceiling(range / 32) (assuming 32-bit integers) all initialized to zero. Then go through the data set and set the bit corresponding to each number to 1. At the end, just count the number of 1 bits.

answered Jun 07 '11 at 20:30

Ted Hopp

232,168
48
399
521

Yes they are integers and range is know and not too big. I also came to this approach only. This is what I am currently using. But was just checking if there is so that I need not store / check the count. However I was using array of size range. Instead I could use range/32 which would decrease my array size.... I am having feeling like using some property of binary operators we can achieve the count directly. But till then I will go through this approach – Harish Jun 08 '11 at 07:40
Some languages have built-in tools. Java has BitSet, which is dynamically expandable and has a `cardinality()` function to return the number of bits set. C++ has class bitset (fixed size) with `bitset::count()`. (C# and Matlab, as far as I know, have bitsets but lack functions to return the cardinality.) If you want to roll your own, there are interesting bit counting algorithms discussed [here](http://gurmeet.net/puzzles/fast-bit-counting-routines/) and [here](http://stackoverflow.com/questions/109023/best-algorithm-to-count-the-number-of-set-bits-in-a-32-bit-integer) – Ted Hopp Jun 10 '11 at 04:36

murgatroid99 · Answer 3 · 2011-06-07T15:49:04.190

1

One simple algorithm is to loop through the list adding numbers to a hash set as you said, but each time check if it is already in the set, and if not add 1 to a running count. Then when you finish looping through the list you will have the number of distinct elements in the final value of the running count. Here is a python example:

count=0
s=set()
for i in list:
    if i not in s:
        s.add(i)
        count+=1

Edit: I use a running count instead of checking the length of a set because in the background the set may be implemented as a sparse array and an extra loop over that array may be needed to check if each hash has a corresponding value. The running count avoids that potential additional overhead.

edited Jun 07 '11 at 15:49

answered Jun 07 '11 at 15:42

murgatroid99

19,007
10
60
95

No need for the running count. Just ask the hash set how big it is at the end. – Ted Hopp Jun 07 '11 at 15:45
@Ted I added an explanation for that in my answer. – murgatroid99 Jun 07 '11 at 15:49
I agree with Magnus. Just `for i in list: s.add(i)`. Since it's a **set**, adding an element already in the set will not increase its size, and is no more expensive than testing whether it's there already. When you're done, just return `s.size()`. – Ted Hopp Jun 07 '11 at 15:57
you mean that `Count` might be an O(n) operation. I have never sen a set with that kind of implementation. – Magnus Jun 07 '11 at 16:10

score 0 · Answer 4 · answered Jun 07 '11 at 16:02

0

I would suggest to sort the array first and look for unique elements after that.

answered Jun 07 '11 at 16:02

Piotr

833
5
12

Optimal way to find number of unique numbers in an array

4 Answers4