Counting unique elements in a list

Question

Is there a straight-forward combination of standard higher-order functions to count the unique elements in a list?

For example the result for

[1, 1, 4, 0, 4, 4]

would be something like

[(1,2), (4,3), (0,1)]

Is order important? If so what is the order? Order of first occurrence? — sepp2k, Sep 14 '10 at 16:53

score 14 · Answer 1 · answered Sep 15 '10 at 14:08

14

Using Data.Map and tuple sections:

 count = Map.fromListWith (+) . map (, 1)

(Add Map.toList if you need a list.)

answered Sep 15 '10 at 14:08

sdcvvc

25,343
4
66
102

sepp2k · Accepted Answer · 2010-09-14T17:03:25.360

12

If order is not important this works:

map (\xs@(x:_) -> (x, length xs)) . group . sort

group . sort will give you a list of lists where all elements that are equal to each other are grouped into the same sublist (without sort, only consecutive equal elements would be grouped together). The map then turns each sublist into a (element, lengthOfSublist)-tuple.

If you want to order the result by first occurrence, you can use zip before the sort to add an index to each element, then, after grouping, sort again by that index and then remove the index.

edited Sep 14 '10 at 17:03

answered Sep 14 '10 at 16:58

sepp2k

363,768
54
674
675

The sort could be very expensive on large lists. It might be better to use KennyTM's or sdcwc's solutions for faster performance. – GeneralBecos May 07 '13 at 17:58
@GeneralBecos Why would sorting be slower than creating a map? Both are `O(n log n)`. – sepp2k May 07 '13 at 18:01
Because assuming you are doing a frequency distribution, the number of elements in only the worst case will be the same as the number of elements in the list. In the more common scenario, the number of elements in the distribution will be far smaller. Therefor on average, the map will outperform the sort. – GeneralBecos May 07 '13 at 18:07
@GeneralBecos Yes, the number of elements in the map will be smaller, but I don't see how that makes the creation of the map cheaper. You're still calling `insert` once for every element in the list and each call to `insert` takes `O(log n)` time - whether the element is already in the map or not. – sepp2k May 07 '13 at 18:13
It will take `O(log n)` in the worst case. On average it will take `O(log m)` where `m << n` – GeneralBecos May 07 '13 at 18:17
@GeneralBecos Ah, I see now. Yes, that's true. – sepp2k May 07 '13 at 18:20
Do I still deserve that negative rep? :-( – GeneralBecos May 07 '13 at 18:33
@GeneralBecos What negative rep? – sepp2k May 07 '13 at 19:13
I think I got flagged for the comment as unconstructive, offensive or spam. – GeneralBecos May 07 '13 at 19:37

score 7 · Answer 3 · answered Sep 14 '10 at 16:59

7

The simplest thing would be to sort the items into order, use "group" to put them into sub-lists of equal elements, and then count the items in each sub-list.

map (\xs -> (head xs, length xs)) . group . sort

answered Sep 14 '10 at 16:59

Paul Johnson

17,438
3
42
59

5

By the way you can write `\xs -> (head xs, length xs)` as `head &&& length`, using Control.Arrow module. – sdcvvc Sep 15 '10 at 14:09

score 6 · Answer 4 · answered Sep 14 '10 at 20:39

If the list contains only integers, you could also use

 import qualified Data.IntMap as I

 countElems1 :: [Int] -> [(Int, Int)]
 countElems1 = I.toList . foldr (\k -> I.insertWith (+) k 1) I.empty

(Remember to compile with optimization though, otherwise this will be 2x slower than the group . sort method. With -O2 it is slightly faster by 14%.)

You could also use one of the multiset packages which makes the function as simple as

 import qualified Math.Combinatorics.Multiset as S
 countElems4 = S.toCounts . S.fromList

but being less efficient.

All of the above solutions ignore the original order.

And that's without the recent speed improvements to the containers library, I'll bet. — Thomas M. DuBuisson, Sep 15 '10 at 00:41

score 1 · Answer 5 · answered Sep 15 '10 at 00:33

1

What your talking about is just run length encoding on sorted data: the free online book Real World Haskell has a great example of this. You will want to sort the list before you put it through the runLengthEncoder.

answered Sep 15 '10 at 00:33

Robert Massaioli

13,379
7
57
73

It is *not* RLE. RLE will give `[(1,2),(4,1),(0,1),(4,2)]`. – kennytm Sep 15 '10 at 07:00
@KennyTM Please note that I said 'on sorted data'. So not quite RLE but almost with sorted input I think it is; isn't it? – Robert Massaioli Sep 15 '10 at 07:16

Counting unique elements in a list

5 Answers5

Linked

Related