12

Is there a straight-forward combination of standard higher-order functions to count the unique elements in a list?

For example the result for

[1, 1, 4, 0, 4, 4]

would be something like

[(1,2), (4,3), (0,1)]
BenMorel
  • 34,448
  • 50
  • 182
  • 322
LennyStackOverflow
  • 2,228
  • 1
  • 19
  • 22

5 Answers5

14

Using Data.Map and tuple sections:

 count = Map.fromListWith (+) . map (, 1)

(Add Map.toList if you need a list.)

sdcvvc
  • 25,343
  • 4
  • 66
  • 102
12

If order is not important this works:

map (\xs@(x:_) -> (x, length xs)) . group . sort

group . sort will give you a list of lists where all elements that are equal to each other are grouped into the same sublist (without sort, only consecutive equal elements would be grouped together). The map then turns each sublist into a (element, lengthOfSublist)-tuple.

If you want to order the result by first occurrence, you can use zip before the sort to add an index to each element, then, after grouping, sort again by that index and then remove the index.

sepp2k
  • 363,768
  • 54
  • 674
  • 675
  • The sort could be very expensive on large lists. It might be better to use KennyTM's or sdcwc's solutions for faster performance. – GeneralBecos May 07 '13 at 17:58
  • @GeneralBecos Why would sorting be slower than creating a map? Both are `O(n log n)`. – sepp2k May 07 '13 at 18:01
  • Because assuming you are doing a frequency distribution, the number of elements in only the worst case will be the same as the number of elements in the list. In the more common scenario, the number of elements in the distribution will be far smaller. Therefor on average, the map will outperform the sort. – GeneralBecos May 07 '13 at 18:07
  • @GeneralBecos Yes, the number of elements in the map will be smaller, but I don't see how that makes the creation of the map cheaper. You're still calling `insert` once for every element in the list and each call to `insert` takes `O(log n)` time - whether the element is already in the map or not. – sepp2k May 07 '13 at 18:13
  • It will take `O(log n)` in the worst case. On average it will take `O(log m)` where `m << n` – GeneralBecos May 07 '13 at 18:17
  • @GeneralBecos Ah, I see now. Yes, that's true. – sepp2k May 07 '13 at 18:20
  • Do I still deserve that negative rep? :-( – GeneralBecos May 07 '13 at 18:33
  • @GeneralBecos What negative rep? – sepp2k May 07 '13 at 19:13
  • I think I got flagged for the comment as unconstructive, offensive or spam. – GeneralBecos May 07 '13 at 19:37
7

The simplest thing would be to sort the items into order, use "group" to put them into sub-lists of equal elements, and then count the items in each sub-list.

map (\xs -> (head xs, length xs)) . group . sort
Paul Johnson
  • 17,438
  • 3
  • 42
  • 59
  • 5
    By the way you can write `\xs -> (head xs, length xs)` as `head &&& length`, using Control.Arrow module. – sdcvvc Sep 15 '10 at 14:09
6

If the list contains only integers, you could also use

 import qualified Data.IntMap as I

 countElems1 :: [Int] -> [(Int, Int)]
 countElems1 = I.toList . foldr (\k -> I.insertWith (+) k 1) I.empty 

(Remember to compile with optimization though, otherwise this will be 2x slower than the group . sort method. With -O2 it is slightly faster by 14%.)

You could also use one of the multiset packages which makes the function as simple as

 import qualified Math.Combinatorics.Multiset as S
 countElems4 = S.toCounts . S.fromList

but being less efficient.

All of the above solutions ignore the original order.

kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
1

What your talking about is just run length encoding on sorted data: the free online book Real World Haskell has a great example of this. You will want to sort the list before you put it through the runLengthEncoder.

Robert Massaioli
  • 13,379
  • 7
  • 57
  • 73