A plea for purity
When programming in Prolog, a major attraction is the generality we enjoy from pure relations.
This lets us use our code in multiple directions, and reason declaratively over our programs and answers.
You can enjoy these benefits if you keep your programs pure.
Possible solution
As always when describing lists, also consider using DCG notation. See dcg for more information.
For example, to describe the list of duplicates in a pure way, consider:
list_duplicates([]) --> [].
list_duplicates([L|Ls]) -->
list_duplicates_(Ls, L),
list_duplicates(Ls).
list_duplicates_([], _) --> [].
list_duplicates_([L0|Ls], L) -->
if_(L0=L, [L], []),
list_duplicates_(Ls, L).
This uses if_//3
to retain generality and determinism (if applicable).
Examples
Here are a few example queries and answers. We start with simple ground cases:
?- phrase(list_duplicates([a,b,c]), Ds).
Ds = [].
?- phrase(list_duplicates([a,b,a]), Ds).
Ds = [a].
Even the most impure version will be able to handle these situations correctly. So, slightly more interesting:
?- phrase(list_duplicates([a,b,X]), Ds).
X = a,
Ds = [a] ;
X = b,
Ds = [b] ;
Ds = [],
dif(X, b),
dif(X, a).
Pretty nice, isn't it? The last part says: Ds = []
is a solution if X
is different from b
and a
. Note the pure relation dif/2
automatically appears in these residual goals and retains the relation's generality.
Here is an example with two variables:
?- phrase(list_duplicates([X,Y]), Ds).
X = Y,
Ds = [Y] ;
Ds = [],
dif(Y, X).
Finally, consider using iterative deepening to fairly enumerate answers for lists of arbitrary length:
?- length(Ls, _), phrase(list_duplicates(Ls), Ds).
Ls = Ds, Ds = [] ;
Ls = [_136],
Ds = [] ;
Ls = [_136, _136],
Ds = [_136] ;
Ls = [_156, _162],
Ds = [],
dif(_162, _156) ;
Ls = Ds, Ds = [_42, _42, _42] ;
Ls = [_174, _174, _186],
Ds = [_174],
dif(_186, _174) .
Multiple occurrences
Here is a version that handles arbitrary many occurrences of the same element in such a way that exactly a single occurrence is retained if (and only if) the element occurs at least twice:
list_duplicates(Ls, Ds) :-
phrase(list_duplicates(Ls, []), Ds).
list_duplicates([], _) --> [].
list_duplicates([L|Ls], Ds0) -->
list_duplicates_(Ls, L, Ds0, Ds),
list_duplicates(Ls, Ds).
list_duplicates_([], _, Ds, Ds) --> [].
list_duplicates_([L0|Ls], L, Ds0, Ds) -->
if_(L0=L, new_duplicate(L0, Ds0, Ds1), {Ds0 = Ds1}),
list_duplicates_(Ls, L, Ds1, Ds).
new_duplicate(E, Ds0, Ds) -->
new_duplicate_(Ds0, E, Ds0, Ds).
new_duplicate_([], E, Ds0, [E|Ds0]) --> [E].
new_duplicate_([L|Ls], E, Ds0, Ds) -->
if_(L=E,
{ Ds0 = Ds },
new_duplicate_(Ls, E, Ds0, Ds)).
The query shown by @fatalize in the comments now yields:
?- list_duplicates([a,a,a], Ls).
Ls = [a].
The other examples yield the same results. For instance:
?- list_duplicates([a,b,c], Ds).
Ds = [].
?- list_duplicates([a,b,a], Ds).
Ds = [a].
?- list_duplicates([a,b,X], Ds).
X = a,
Ds = [a] ;
X = b,
Ds = [b] ;
Ds = [],
dif(X, b),
dif(X, a).
?- list_duplicates([X,Y], Ds).
X = Y,
Ds = [Y] ;
Ds = [],
dif(Y, X).
I leave the case ?- list_duplicates(Ls, Ls).
as an exercise.
Generality: Multiple directions
Ideally, we want to be able to use a relation in all directions.
For example, our program should be able to answer questions like:
What does a list look like if its duplicates are [a,b]
?
With the version shown above, we get:
?- list_duplicates(Ls, [a,b]).
nontermination
Luckily, a very simple change allows as to answer such questions!
One such change is to simply write:
list_duplicates(Ls, Ds) :-
length(Ls, _),
phrase(list_duplicates(Ls, []), Ds).
This is obviously declaratively admissible, because Ls
must be a list. Operationally, this helps us to enumerate lists in a fair way.
We now get:
?- list_duplicates(Ls, [a,b]).
Ls = [a, a, b, b] ;
Ls = [a, b, a, b] ;
Ls = [a, b, b, a] ;
Ls = [a, a, a, b, b] ;
Ls = [a, a, b, a, b] ;
Ls = [a, a, b, b, a] ;
Ls = [a, a, b, b, b] ;
Ls = [a, a, b, b, _4632],
dif(_4632, b),
dif(_4632, a) ;
etc.
Here is a simpler case, using only a single element:
?- list_duplicates(Ls, [a]).
Ls = [a, a] ;
Ls = [a, a, a] ;
Ls = [a, a, _3818],
dif(_3818, a) ;
Ls = [a, _3870, a],
dif(_3870, a) ;
Ls = [_4058, a, a],
dif(a, _4058),
dif(a, _4058) ;
Ls = [a, a, a, a] ;
etc.
Maybe even more interesting:
What does a list without duplicates look like?
Our program answers:
?- list_duplicates(Ls, []).
Ls = [] ;
Ls = [_3240] ;
Ls = [_3758, _3764],
dif(_3764, _3758) ;
Ls = [_4164, _4170, _4176],
dif(_4176, _4164),
dif(_4176, _4170),
dif(_4170, _4164) .
Thus, the special case of a list where all elements are distinct naturally exists as a special case of the more general relation we have implemented.
We can use this relation to generate answers (as shown above), and also to test whether a list consists of distinct elements:
?- list_duplicates([a,b,c], []).
true.
?- list_duplicates([b,b], []).
false.
Unfortunately, the following even more general query still yields:
?- list_duplicates([b,b|_], []).
nontermination
On the plus side, if the length of the list is fixed, we get in such cases:
?- length(Ls, L), maplist(=(b), Ls),
( true ; list_duplicates(Ls, []) ).
Ls = [],
L = 0 ;
Ls = [],
L = 0 ;
Ls = [b],
L = 1 ;
Ls = [b],
L = 1 ;
Ls = [b, b],
L = 2 ;
Ls = [b, b, b],
L = 3 ;
Ls = [b, b, b, b],
L = 4 .
This is some indication that the program indeed terminates in such cases. Note that the answers are of course now too general.
Efficiency
It is well known in high-performance computing circles that as long as your program is fast enough, its correctness is barely worth considering.
So, the key question is of course: How can we make this faster?
I leave this is a very easy exercise. One way to make this faster in specific cases is to first check whether the given list is sufficiently instantiated. In that case, you can apply an ad hoc solution that fails terribly in more general cases, but has the extreme benefit that it is fast!