I would like to create a function remove_duplicates
that takes a list
of any type (e.g. can be an int list
or a bool list
or a int list list
or a whatever list
) and returns the same list without duplicates, is this possible in Standard ML?
3 Answers
Is a function that takes a list of any type and returns the list without duplicates possible in Standard ML?
No.
To determine if one element is a duplicate of another, their values must be comparable. "Any type", or 'a
in Standard ML, is not comparable for equality. So while you cannot have a val nub : 'a list -> 'a list
that removes duplicates, here are four alternative options:
What @qouify suggests, the built-in equality type
''a
, so anything you can use=
on:val nub : ''a list -> ''a list
What @kopecs suggests, a function that takes an equality operator as parameter:
val nub : ('a * 'a -> bool) -> 'a list -> 'a list
Which is a generalisation of 1., since here,
nub op= : ''a list -> ''a list
. This solution is kind of neat since it lets you remove not only duplicates, but also redundant representatives of arbitrary equivalence classes, e.g.nub (fn (x, y) => (x mod 3) = (y mod 3))
will only preserve integers that are distinct modulo 3. But its complexity is O(n²). (-_- )ノ⌒┻━┻Because it is O(n²),
nub
is considered harmful.As the article also suggests, the alternative is to use ordering rather than equality to reduce the complexity to O(n log n). While in Haskell this means only changing the type class constraint:
nub :: Eq a => [a] -> [a] nubOrd :: Ord a => [a] -> [a]
and adjusting the algorithm, it gets a little more complicated to express this constraint in SML. While we do have
''a
to representEq a => a
(that we can use=
on our input), we don't have a similar special syntax support for elements that can be compared as less/equal/greater, and we also don't have type classes. We do have the following built-in order type:datatype order = LESS | EQUAL | GREATER
so if you like kopecs' solution, a variation with a better running time is:
val nubOrd : ('a * 'a -> order) -> 'a list -> 'a list
since it can use something like a mathematical set of previously seen elements, implemented using some kind of balanced search tree; n inserts each of complexity O(log n) takes a total of O(n log n) steps.
One of SML's winner features is its composable module system. Instead of using parametric polymorphism and feeding the function
nubOrd
with an order comparison function, you can create a module that takes another module as a parameter (a functor).First, let's define a signature for modules that represent ordering of types:
signature ORD = sig type t val compare : t * t -> order end
(Notice that there isn't a
'
in front oft
.)This means that anyone could make a
struct ... end : ORD
by specifying at
and a correspondingcompare
function fort
s. Many built-in types have pre-definedcompare
functions: int hasInt.compare
and real hasReal.compare
.Then, define a tree-based set data structure; I've used a binary search tree, and I've skipped most functions but the ones strictly necessary to perform this feat. Ideally you might extend the interface and use a better tree type, such as a self-balancing tree. (Unfortunately, since you've tagged this Q&A both as SML/NJ and Moscow ML, I wasn't sure which module to use, since they extend the standard library in different ways when it comes to balanced trees.)
functor TreeSet (X : ORD) = struct type t = X.t datatype 'a tree = Leaf | Branch of 'a tree * 'a * 'a tree val empty = Leaf fun member (x, Leaf) = false | member (x, Branch (left, y, right)) = case X.compare (x, y) of EQUAL => true | LESS => member (x, left) | GREATER => member (x, right) fun insert (x, Leaf) = Branch (Leaf, x, Leaf) | insert (x, Branch (left, y, right)) = case X.compare (x, y) of EQUAL => Branch (left, y, right) | LESS => Branch (insert (x, left), y, right) | GREATER => Branch (left, y, insert (x, right)) end
Lastly, the
ListUtils
functor contains thenubOrd
utility function. The functor takes a structureX : ORD
just like theTreeSet
functor does. It creates anXSet
structure by specialising theTreeSet
functor using the same ordering module. It then uses thisXSet
to efficiently keep a record of the elements it has seen before.functor ListUtils (X : ORD) = struct structure XSet = TreeSet(X) fun nubOrd (xs : X.t list) = let val init = ([], XSet.empty) fun go (x, (ys, seen)) = if XSet.member (x, seen) then (ys, seen) else (x::ys, XSet.insert (x, seen)) in rev (#1 (foldl go init xs)) end end
Using this functor to remove duplicates in an
int list
:structure IntListUtils = ListUtils(struct type t = int val compare = Int.compare end) val example = IntListUtils.nubOrd [1,1,2,1,3,1,2,1,3,3,2,1,4,3,2,1,5,4,3,2,1] (* [1, 2, 3, 4, 5] *)
The purpose of all that mess is a
nubOrd
without a direct extra function parameter.Unfortunately, in order for this to extend to
int list list
, you need to create thecompare
function for that type, since unlikeInt.compare
, there isn't a generic one available in the standard library either. (This is where Haskell is a lot more ergonomic.)So you might go and write a generic, lexicographical list compare function: If you know how to compare two elements of type
'a
, you know how to compare two lists of those, no matter what the element type is:fun listCompare _ ([], []) = EQUAL (* empty lists are equal *) | listCompare _ ([], ys) = LESS (* empty is always smaller than non-empty *) | listCompare _ (xs, []) = GREATER (* empty is always smaller than non-empty *) | listCompare compare (x::xs, y::ys) = case compare (x, y) of EQUAL => listCompare compare (xs, ys) | LESS => LESS | GREATER => GREATER
And now,
structure IntListListUtils = ListUtils(struct type t = int list val compare = listCompare Int.compare end) val example2 = IntListListUtils.nubOrd [[1,2,3],[1,2,3,2],[1,2,3]] (* [[1,2,3],[1,2,3,2]] *)
So even though
[1,2,3]
and[1,2,3,2]
contain duplicates, they are notEQUAL
when youcompare
them. But the third element isEQUAL
to the first one, and so it gets removed as a duplicate.
Some last observations:
You may consider that even though each
compare
is only run O(log n) times, a singlecompare
for some complex data structure, such as a(whatever * int) list list
may still be expensive. So another improvement you can make here is to cache the result of everycompare
output, which is actually what Haskell'snubOrdOn
operator does. ┳━┳ ヽ(ಠل͜ಠ)ノThe functor approach is used extensively in Jane Street's OCaml Base library. The quick solution was to pass around an
'a * 'a -> order
function around every single time younub
something. One moral, though, is that while the module system does add verbosity, if you provide enough of this machinery in a standard library, it will become quite convenient.If you think the improvement from O(n²) to O(n log n) is not enough, consider Fritz Henglein's Generic top-down discrimination for sorting and partitioning in linear time (2012) and Edward Kmett's Haskell discrimination package's
nub
for a O(n)nub
.

- 15,635
- 1
- 41
- 66
-
Is it such a big deal that `nub` has a quadratic runtime? Does all our software need to run at optimal speeds? – Thorkil Værge Aug 06 '20 at 06:54
-
@ThorkilVærge: Yes, and no. – sshine Aug 06 '20 at 07:06
-
Nit: SML does not have a higher-order module system, only a first-order one. – Andreas Rossberg Aug 06 '20 at 11:37
-
@AndreasRossberg: Thanks for pointing that out. :) I've used the term "higher-order module" as an analogue to "higher-order functions". I suppose that you don't ever create `functor`s that take arbitrary modules as parameter, but always concrete modules. – sshine Aug 06 '20 at 13:44
-
@SimonShine, yes, and you can't do more than that in plain SML. Functors cannot take functors as arguments. – Andreas Rossberg Aug 06 '20 at 15:15
-
@AndreasRossberg: Considering your work with 1ML, was that ever explored by you or anyone else? – sshine Aug 07 '20 at 11:05
-
@SimonShine, higher-order functors have always been available in SML/NJ, in Moscow ML since v2, and in a few other SMLs. They just aren't standardised, and all these extensions differ. Back in 2007 or so, I created a proposal for Successor ML. FWIW, OCaml has HO functors, too. – Andreas Rossberg Aug 07 '20 at 12:22
Yes. This is possible in SML through use of parametric polymorphism. You want a function of most general type 'a list -> 'a list
where 'a
is a type variable (i.e., variable that ranges over types) that would be read as alpha.
For some more concrete examples of how you might apply this (the explicit type variable after fun
is optional):
fun 'a id (x : 'a) : 'a = x
Here we have the identity function with type 'a -> 'a
.
We can declare similar functions with some degree of specialisation of the types, for instance
fun map _ [] = []
| map f (x::xs) = f x :: map f xs
Where map
has most general type ('a -> 'b) -> 'a list -> 'b list
, i.e, takes two curried arguments, one with some function type and another with some list type (agrees with function's domain) and returns a new list with type given by the codomain of the function.
For your specific problem you'll probably also want to take an equality function in order to determine what is a "duplicate" or you'll probably restrict yourself to "equality types" (types that can be compared with op=
, represented by type variables with two leading apostrophes, e.g., ''a
).

- 1,545
- 10
- 20
Yes sml provides polymorphism to do such things. In many cases you actually don't care for the type of the item in your lists (or other structures). For instance this function checks (already present in the List
structure) for the existence of an item in a list:
fun exists _ [] = false
| exists x (y :: l) = x = y orelse exists x l
Such function works for any type of list as long as the equal operator is defined for this type (such type is called an equality type). You can do the same for remove_duplicates
. In order to work with list of items of non equality types you will have to give remove_duplicates
an additional function that checks if two items are equal.

- 3,698
- 2
- 15
- 26