Haskell Remove duplicates from list

Question

I am new to Haskell and I am trying the below code to remove duplicates from a list. However it does not seem to work.

compress []     = []
compress (x:xs) = x : (compress $ dropWhile (== x) xs)

I have tried some search, all the suggestions use foldr/ map.head. Is there any implementation with basic constructs?

Hint: you have exactly the correct idea, except `dropWhile` is the wrong function. `dropWhile` removes every element of the list *until* it hits one that `== x`; you need something that removes every element like that instead. — Tikhon Jelvis, Jun 30 '15 at 23:49
There's a good discussion on this subject [here](http://stackoverflow.com/questions/16108714/haskell-removing-duplicates-from-a-list). — Chris R. Timmons, Jul 01 '15 at 00:07
Note that there is a built-in named `nub` from `Data.List` module which does that. — Sibi, Jul 01 '15 at 00:39
@TikhonJelvis Are you sure? GHCi gives `dropWhile (==1) [1,1,1,2,3,1,4,5,6] ==> [2,3,1,4,5,6]`. — chi, Jul 01 '15 at 08:18
@TikhonJelvis chi is right. You meant "until it its one that `/= x`". — jub0bs, Jul 01 '15 at 15:31
Oh yeah, totally got that backwards, sorry! Either way, `dropWhile` is not the right function. — Tikhon Jelvis, Jul 01 '15 at 17:16

score 6 · Accepted Answer · answered Jul 01 '15 at 01:13

I think that the issue that you are referring to in your code is that your current implementation will only get rid of adjacent duplicates. As it was posted in a comment, the builtin function nub will eliminate every duplicate, even if it's not adjacent, and keep only the first occurrence of any element. But since you asked how to implement such a function with basic constructs, how about this?

myNub :: (Eq a) => [a] -> [a]
myNub (x:xs) = x : myNub (filter (/= x) xs)
myNub [] = []

The only new function that I introduced to you there is filter, which filters a list based on a predicate (in this case, to get rid of every element present in the rest of that list matching the current element).

I hope this helps.

score 2 · Answer 2 · answered Jul 01 '15 at 08:27

First of all, never simply state "does not work" in a question. This leaves to the reader to check whether it's a compile time error, run time error, or a wrong result.

In this case, I am guessing it's a wrong result, like this:

> compress [1,1,2,2,3,3,1]
[1,2,3,1]

The problem with your code is that it removes successive duplicates, only. The first pair of 1s gets compressed, but the last lone 1 is not removed because of that.

If you can, sort the list in advance. That will make equal elements close, and then compress does the right job. The output will be in a different order, of course. There are ways to keep the order too if needed (start with zip [0..] xs, then sort, then ...).

If you can not sort becuase there is really no practical way to define a comparison, but only an equality, then use nub. Be careful that this is much less efficient than sorting & compressing. This loss of performance is intrinsic: without a comparator, you can only use an inefficient quadratic algorithm.

The OP's question was about implementing nub, not finding it in the library. The discussion of the problems is interesting but not addressing that or the OP's confusion about folds and map. — Alain O'Dea, Jul 01 '15 at 11:49

score 0 · Answer 3 · answered Jun 30 '15 at 23:58

0

foldr and map are very basic FP constructs. However, they are very general and I found them a bit mind-bending to understand for a long time. Tony Morris' talk Explain List Folds to Yourself helped me a lot.

In your case an existing list function like filter :: (a -> Bool) -> [a] -> [a] with a predicate that exludes your duplicate could be used in lieu of dropWhile.

answered Jun 30 '15 at 23:58

Alain O'Dea

21,033
1
58
84

The **base** library included with most Haskell implementations (certainly GHC) includes efficient Set data structures that do this as you add to them. Combining **Data.Set.fromList** and **Data.Set.toList** would let you do this, but I don't think library use your intent here. – Alain O'Dea Jul 01 '15 at 00:02
2

However, converting the list into a set will order the elements and require an Ord constraint. Nub only requires an Eq constraint and preserves the ordering. Of course, the set option is more efficient at O(n log n) vs O(n^2). The reordering and/or the additional constraint may not be desirable in this case. – fgv Jul 01 '15 at 01:19
@fgv what do you think about my answer? I kept the Set suggestion as a comment because it really didn't seem to match the OP's intent. – Alain O'Dea Jul 01 '15 at 11:34
1

I think your answer steers the OP in the right direction. The idea of using set or map is good too from a performance standpoint, only that it will reorder elements and put that additional constraint, which the OP may not want... – fgv Jul 01 '15 at 15:09
I completely agree. I was apprehensive about posting the comment suggesting **Data.Set.fromList** based on library use, but your identification of output ordering and the **Ord** constraint on input are even more important considerations since they change the type of the OP's function. – Alain O'Dea Jul 01 '15 at 15:16

Haskell Remove duplicates from list

3 Answers3