How to count distinct values in a list in linear time?

Question

I can think of sorting them and then going over each element one by one but this is nlogn. Is there a linear method to count distinct elements in a list?

sampson-chen · Accepted Answer · 2012-12-13T18:09:41.917

11

Update: - distinct vs. unique

If you are looking for "unique" values (As in if you see an element "JASON" more than once, than it is no longer unique and should not be counted)

You can do that in linear time by using a HashMap ;)

(The generalized / language-agnostic idea is Hash table)

Each entry of a HashMap / Hash table is <KEY, VALUE> pair where the keys are unique (but no restrictions on their corresponding value in the pair)

Step 1:

Iterate through all elements in the list once: O(n)

For each element seen in the list, check to see if it's in the HashMap already O(1), amortized
- If not, add it to the HashMap with the value of the element in the list as the KEY, and the number of times you've seen this value so far as the VALUE O(1)
- If so, increment the number of times you've seen this KEY so far O(1)

Step2:

Iterate through the HashMap and count the KEYS with VALUE equal to exactly 1 (thus unique) O(n)

Analysis:

Runtime: O(n), amortized
Space: O(U), where U is the number of distinct values.

If, however, you are looking for "distinct" values (As in if you want to count how many different elements there are), use a HashSet instead of a HashMap / Hash table, and then simply query the size of the HashSet.

edited Dec 13 '12 at 18:09

answered Dec 13 '12 at 17:22

sampson-chen

45,805
12
84
81

This is how to find a number of unique values in the list and the question is about finding a number of distinct values. To count distinct values use [HashSet](http://docs.oracle.com/javase/6/docs/api/java/util/HashSet.html). Just add every element of the list to the HashSet and check its cardinality. – user1871166 Dec 13 '12 at 17:58
1

Wouldn't you want to count ALL the values in the HashMap in step 2, rather than just those with value 1? Doing it this way would only count the elements that appear exactly once, eg. the list {PAT, PAT, STEVE, JOEL} would result in a value of 2 (only STEVE and JOEL occur once) rather than the correct value of 3 distinct names. – Jason Dec 13 '12 at 17:59
Note! There is an interpretation discrepancy here: My assumption is that if you see a particular value 2 or more times, it is no longer "distinct" / "unique" - but I'll edit that in to be more clear. – sampson-chen Dec 13 '12 at 18:06
so since this algorithm is amortized, it does not guarantee O(n) right? It could well have a worst case runtime of O(n2)? – polerto Dec 13 '12 at 19:06
@polerto With any reasonable implementation of HashMap / Hashtable (such as the ones from Java libs) you can assume `O(n)` runtime. – sampson-chen Dec 13 '12 at 19:15
You can't do that in linear time with HashMap, because there may exist collisions. With collisions, then even the lookup in the map may have complexity `O(n)`, if every object has the same hash value. – SpaceTrucker Dec 18 '12 at 11:56

score 1 · Answer 2 · edited May 23 '17 at 11:59

1

You can adapt this extremely cool O(n)-time and O(1)-space in-place algorithm for removing duplicates to the task of counting distinct values -- simply count the number of values equal to the sentinel value in a final O(n) pass, and subtract that from the size of the list.

edited May 23 '17 at 11:59

Community

1
1

answered Dec 13 '12 at 17:46

j_random_hacker

50,331
10
105
169

score 0 · Answer 3 · answered Dec 13 '12 at 17:54

0

Add every element of the list to a HashSet and then check the size (cardinality) of the HashSet, which is the number of distinct values in the list.

answered Dec 13 '12 at 17:54

user1871166

258
2
9

How to count distinct values in a list in linear time?

3 Answers3

Linked

Related