Removing duplicates from a list in Haskell without elem

Question

I'm trying to define a function which will remove duplicates from a list. So far I have a working implementation:

rmdups :: Eq a => [a] -> [a]
rmdups [] = []
rmdups (x:xs)   | x `elem` xs   = rmdups xs
                | otherwise     = x : rmdups xs

However I'd like to rework this without using elem. What would be the best method for this?

I'd like to do this using my own function and not nub or nubBy.

Link to [Data.List (nub)](http://hackage.haskell.org/package/base-4.6.0.1/docs/Data-List.html#v:nub) for when I'm googling it again... — mb21, Oct 17 '15 at 08:58

scvalex · Answer 1 · 2013-04-19T16:36:24.997

66

Both your code and nub have O(N^2) complexity.

You can improve the complexity to O(N log N) and avoid using elem by sorting, grouping, and taking only the first element of each group.

Conceptually,

rmdups :: (Ord a) => [a] -> [a]
rmdups = map head . group . sort

Suppose you start with the list [1, 2, 1, 3, 2, 4]. By sorting it, you get, [1, 1, 2, 2, 3, 4]; by grouping that, you get, [[1, 1], [2, 2], [3], [4]]; finally, by taking the head of each list, you get [1, 2, 3, 4].

The full implementation of the above just involves expanding each function.

Note that this requires the stronger Ord constraint on the elements of the list, and also changes their order in the returned list.

edited Apr 19 '13 at 16:36

answered Apr 19 '13 at 16:29

scvalex

14,931
2
34
43

3

Very nice, but note that this places an `Ord` restriction on the list elements, rather than just `Eq`, and the order is not preserved. – Benjamin Hodgson Apr 19 '13 at 16:33
Good point. Made a note of that and the other change in semantics. – scvalex Apr 19 '13 at 16:35
3

[`ordNub`](https://github.com/nh2/haskell-ordnub) provides a copy-paste-ready, *stable* (order-preserving) variant of @scvalex's suggestion to use `Ord`. It also contains analogues for `\\`, `union` and `intersect`. – nh2 Nov 24 '14 at 14:52

The Internet · Answer 2 · 2016-02-27T03:55:40.123

44

Even easier.

import Data.Set 
mkUniq :: Ord a => [a] -> [a]
mkUniq = toList . fromList

Convert the set to a list of elements in O(n) time:

toList :: Set a -> [a]

Create a set from a list of elements in O(n log n) time:

fromList :: Ord a => [a] -> Set a

In python it would be no different.

def mkUniq(x): 
   return list(set(x)))

edited Feb 27 '16 at 03:55

answered Sep 05 '13 at 04:35

The Internet

7,959
10
54
89

1

elegant, but I don't think Set preserves the order. – sam boosalis Nov 27 '14 at 07:49
1

yes, he didn't mention order preservation in his OP. – The Internet Nov 27 '14 at 09:07

Nikita Volkov · Answer 3 · 2013-04-24T14:03:05.557

27

Same as @scvalex's solution the following has an O(n * log n) complexity and an Ord dependency. In difference to it, it preserves the order, keeping the first occurences of items.

import qualified Data.Set as Set

rmdups :: Ord a => [a] -> [a]
rmdups = rmdups' Set.empty where
  rmdups' _ [] = []
  rmdups' a (b : c) = if Set.member b a
    then rmdups' a c
    else b : rmdups' (Set.insert b a) c

Benchmark results

benchmark results

As you can see, the benchmark results prove this solution to be the most effective. You can find the source of this benchmark here.

edited Apr 24 '13 at 14:03

answered Apr 19 '13 at 18:17

Nikita Volkov

42,792
11
94
169

+1. `Set` is definitely a more efficient data structure for large inputs. I'd like to see where the standard library's `nub` fits into this graph - does laziness have an effect on the performance? – Benjamin Hodgson Apr 24 '13 at 13:57
Heh. I was going to write up and submit a version using `Set` insertion, but you beat me to it. Good show. – Inaimathi Apr 24 '13 at 13:58

Benjamin Hodgson · Accepted Answer · 2015-04-22T21:31:38.817

I don't think you'll be able to do it without elem (or your own re-implementation of it).

However, there is a semantic issue with your implementation. When elements are duplicated you're keeping the last one. Personally, I'd expect it to keep the first duplicate item and drop the rest.

*Main> rmdups "abacd"
"bacd"

The solution is to thread the 'seen' elements through as a state variable.

removeDuplicates :: Eq a => [a] -> [a]
removeDuplicates = rdHelper []
    where rdHelper seen [] = seen
          rdHelper seen (x:xs)
              | x `elem` seen = rdHelper seen xs
              | otherwise = rdHelper (seen ++ [x]) xs

This is more-or-less how nub is implemented in the standard library (read the source here). The small difference in nub's implementation ensures that it is non-strict, while removeDuplicates above is strict (it consumes the entire list before returning).

Primitive recursion is actually overkill here, if you're not worried about strictness. removeDuplicates can be implemented in one line with foldl:

removeDuplicates2 = foldl (\seen x -> if x `elem` seen
                                      then seen
                                      else seen ++ [x]) []

@BradStevenson These solutions are based on very inefficient operations - both the `elem` function and `(++)` have an `O(n)` complexity on list. Although Haskell's laziness protects the algorithm from executing `(++)` on every cycle, these implementations still fall short quite contrastly as compared to alternative implementations presented in other answers. See [__benchmark results__](http://stackoverflow.com/a/16111081/485115). — Nikita Volkov, Apr 24 '13 at 11:06
I think `removeDuplicates3 = foldr (\x seen -> if x \`elem\` seen then seen else x : seen) []` runs faster than removeDuplicates2, as `(:)` operation is constant. — Andrei-Niculae Petre, Apr 13 '14 at 12:20

fp_mora · Answer 5 · 2018-05-07T21:08:22.267

Graham Hutton has a rmdups function on p. 86 of Programming in Haskell. It preserves order. It is as follows.

rmdups :: Eq a => [a] -> [a]
rmdups [] = []
rmdups (x:xs) = x : filter (/= x) (rmdups xs)
rmdups "maximum-minimum"

"maxiu-n"

This was bothering me until I saw Hutton's function. Then, I tried, again. There are two versions, The first keeps the last duplicate, the second keeps the first.

rmdups ls = [d|(z,d)<- zip [0..] ls, notElem d $ take z ls]
rmdups "maximum-minimum"

"maxiu-n"

If you want to take the first and not the last duplicate elements of the list, as you are trying to do, just change take to drop in the function and change the enumeration zip [0..] to zip [1..].

Muhammed Hasan Celik · Answer 6 · 2017-09-19T19:58:16.303

It is too late to answer this question but I want to share my solution which is original without using elem and don't assume Ord.

rmdups' :: (Eq a) => [a] -> [a]
rmdups' [] = []
rmdups' [x] = [x]
rmdups' (x:xs) = x : [ k  | k <- rmdups'(xs), k /=x ]

This solution removes duplicates in the end of input, while question implementation deletes in the beginning. For example,

rmdups "maximum-minimum"
-- "ax-nium"

rmdups' "maximum-minimum"
-- ""maxiu-n"

Also, this code complexity is O(N*K) where N is the length of string and K is the number of unique characters in the string. N >= K thus, it will be O(N^2) in worst-case but this means that there is no repetition in the string and this is unlike since you try to delete duplicates in the string.

score 3 · Answer 7 · answered May 13 '21 at 18:19

I would like to add to @fp_mora answer that on page 136 of Programming in Haskell there is another slightly different implementation:

rmdups :: Eq a => [a] -> [a]
rmdups [] = []
rmdups (x : xs) = x : rmdups (filter (/= x) xs)

It was easier for me to wrap my head around this one.

score 2 · Answer 8 · 2017-08-23T03:58:22.740

Using recursion-schemes:

import Data.Functor.Foldable

dedup :: (Eq a) => [a] -> [a]
dedup = para pseudoalgebra
    where pseudoalgebra Nil                 = []
          pseudoalgebra (Cons x (past, xs)) = if x `elem` past then xs else x:xs

While this is certainly more advanced, I think it is quite elegant and shows off some worthwhile functional programming paradigms.

score 1 · Answer 9 · answered Jan 04 '19 at 06:43

1

You can use this compress function too.

cmprs ::Eq a=>[a] -> [a]
--cmprs [] = [] --not necessary
cmprs (a:as) 
    |length as == 1 = as
    |a == (head as) = cmprs as
    |otherwise = [a]++cmprs as

answered Jan 04 '19 at 06:43

mrkanet

21
6

score 0 · Answer 10 · answered Jan 09 '20 at 10:22

0

Using dropWhile also works, but remember to sort the list before using this

rmdups :: (Eq a) => [a] -> [a]
rmdups [] = []
rmdups (x:xs) = x : (rmdups $ dropWhile (\y -> y == x) xs)

answered Jan 09 '20 at 10:22

Lokesh Mohanty

3
3

Federico Fogli · Answer 11 · 2022-06-16T15:35:28.707

remdups xs = foldr (\y ys -> y:filter (/= y) ys) [] xs

this apply the function to the first element and the list cnstructed recursively in the same way. at the first iteration basically you create a list where you only know the first element, and the rest of the list is constructed in the same way (adding the element to the list), and then is filtered to remove the item that specific cycle is adding.

So every iteration adds an element (call it X) to the list and filter the list removing all elements =X

score -1 · Answer 12 · edited Nov 30 '15 at 16:04

-1

...or by using the function union from Data.List applied to itself:

import Data.List

unique x = union x x

edited Nov 30 '15 at 16:04

Mifeet

12,949
5
60
108

answered Nov 30 '15 at 14:56

Fabio

31
1

"Duplicates, and elements of the first list, are removed from the the second list, but if the first list contains duplicates, so will the result.". Cfr [documentation](https://hackage.haskell.org/package/base-4.12.0.0/docs/Data-List.html#v:union) – Ludovic Kuty Jan 03 '19 at 09:00
unique x = union [] x would probably be better idea. – Blomex Feb 26 '20 at 20:45
1

It is slow as hell – outoftime Jun 23 '21 at 04:13

score -1 · Answer 13 · edited Mar 16 '21 at 19:28

-1

remove_duplicates (x:xs)
  | xs == []       = [x]
  | x == head (xs) = remove_duplicates xs
  | otherwise      = x : remove_duplicates xs

You could try doing this. I've merely replaced 'elem' with my own implementation. It works for me.

edited Mar 16 '21 at 19:28

Sebastián Palma

32,692
6
40
59

answered Nov 30 '20 at 18:52

RISHAV DAS

1

this only works for consecutive duplicates (they have to next to each other). – Mahmoud May 13 '21 at 17:45

Removing duplicates from a list in Haskell without elem

13 Answers13

Benchmark results

Linked