Removing consecutive duplicates from a list in Prolog

Question

I have list like following :

[a,b,b,e,e,f,f,g]

The first and last entries are single, while all others are repeated. How can I remove these extra entries. I should not disturb the order.

I tried following but it gets an empty list:

removeDups([], []).
removeDups([H1,H2|T], Newlist) :- 
    (H1 == H2 -> removeDups([H2|T],Newlist) 
    ; removeDups(T,Newlist)).

?- removeDups([a,b,b,c,c,d,d,e],L).
L = [].

Edit: I have already checked many similar question on stackoverflow. But in my list the duplicates are consecutive and therefore a simpler solution may be possible. I could not find a question on removing consecutive duplicate entries.

One clause your solution lacks is the case where the first argument is a list of one element. Your base case handles the empty list, and your recursive case handles at least two elements. So it will fail on a list of one element. In addition, your predicate always reduces down to an empty list for the second argument rather than collecting the resultant list. — lurker, Jul 01 '16 at 10:03
Adding removeDups([H|[]], [H|[]]). did not help. For second aspect, if I add removeDups([H1,H2|T], L), it becomes endless loop. — rnso, Jul 01 '16 at 10:09
@lurker That's not an exact duplicate, because this question concerns _consecutive_ repetitions, while the other one explicitly does not. — SQB, Jul 01 '16 at 10:10
@SQB the originally stated question didn't indicate that. That condition was added after I had marked it. — lurker, Jul 01 '16 at 10:23
In your solution attempt, you're using *equality* for testing for duplicates. But the solutions posted so far use instead *unification*. This is, btw, a recurring problem in posts dealing with duplicates. Note that equality and unification only give the same results when you're dealing with *ground* terms. — Paulo Moura, Jul 01 '16 at 13:06

score 4 · Answer 1 · edited May 23 '17 at 11:44

Why resort to solutions that can only be used in very specific cases, and yield wrong results in other cases? Imperative programming is so 1980... The 80s were cool, I know, but a bit limited too, no?

Try thinking in terms of relations that can be used in all directions!

Here is a solution that uses if_/3 for generality:

no_consecutive_duplicates([], []).
no_consecutive_duplicates([L|Ls0], Ls) :-
        no_dups(Ls0, L, Ls).

no_dups([], E, [E]).
no_dups([L|Ls0], E, Ls) :-
        if_(L=E, Ls=Ls1, Ls=[E|Ls1]),
        no_dups(Ls0, L, Ls1).

It works also in the most general case:

?- no_consecutive_duplicates(Ls0, Ls).
Ls = Ls0, Ls0 = []
Ls = Ls0, Ls0 = [_G501] ;
Ls = [_G501],
Ls0 = [_G501, _G501] ;
Ls = [_G501],
Ls0 = [_G501, _G501, _G501] .

For fair enumeration, use for example:

?- length(Ls0, _), no_consecutive_duplicates(Ls0, Ls).
Ls = Ls0, Ls0 = [] ;
Ls = Ls0, Ls0 = [_G501] ;
Ls = [_G501],
Ls0 = [_G501, _G501] ;
Ls = [_G775, _G775],
Ls0 = [_G787, _G775],
dif(_G775, _G787) .

Note the use of prolog-dif to declaratively state that two terms are different.

And by the way, "normal" cases work too:

?- no_consecutive_duplicates([a,b,c], Ls). 
Ls = [a,b,c].

?- no_consecutive_duplicates([a,a,b,c,c], Ls).
Ls = [a,b,c].

Note that both queries succeed deterministically.

And isn't it nice that we can generalize this and inspect also slightly more complex cases?

?- no_consecutive_duplicates([a,b,X], Ls).
Ls = [a, b],
X = b ;
Ls = [a, b, X],
dif(X, b).

Stay pure folks!

In your solution, you use unification for testing for duplicates but the original post uses equality. Try e.g. the query `no_consecutive_duplicates([a(_),b(_),b(_),c(_)], Ls)`. — Paulo Moura, Jul 01 '16 at 13:04

lurker · Accepted Answer · 2016-07-01T11:09:40.907

2

In your predicate, the second argument should always represent the result of duplicates being removed from the first argument. That leads to the following clauses when broken down into each case:

remove_dups([], []).   % Empty list is empty list with dups removed
remove_dups([X], [X]). % Single element list is itself with dups removed

% The result of removing duplicates from `[X,X|T]` should be the same
%   as the result of removing duplicates from `[X|T]`
remove_dups([X,X|T], [X|R]) :-
    remove_dups([X|T], [X|R]).

% The result of removing duplicates from `[X,Y|T]` where X and Y are different
%   should be the result [X|R] where R is the result of removing duplicates
%   from [Y|T]
remove_dups([X,Y|T], [X|R]) :-
    X \== Y,
    remove_dups([Y|T], R).

The 3rd and 4th clauses could be replaced with:

remove_dups([X,Y|T], [X|R]) :-
    (   X == Y
    ->  remove_dups([Y|T], [X|R])
    ;   remove_dups([Y|T], R)
    ).

But then it will limit solutions where the first argument is variable.

edited Jul 01 '16 at 11:09

answered Jul 01 '16 at 10:32

lurker

56,987
9
69
103

Yes it works! Why is R1 needed in if-then-else solution? – rnso Jul 01 '16 at 10:41
The result of the recursive call to `remove_dups` in that case needs to be the tail of the result, `R`. Therefore, an auxiliary variable to represent the tail needs to be used, which is `R1`. If you have it broken into separate clauses, note that in my 4th clause the result is written as `[X|R]` rather than just `R`, which avoids the auxiliary variable. – lurker Jul 01 '16 at 10:43
@mso I refactored the solution a little bit which prevents an infinite loop when the second argument is given and the first argument is variable. – lurker Jul 01 '16 at 11:12
Your replacement for the 3rd and 4th clauses changes the semantics of the predicate. Try e.g. the goal `remove_dups([a(_),b(_),b(_),c(_)], Ls)` and you will find that the first version is using unification for testing duplicates while the second version uses equality (as in the original post). – Paulo Moura Jul 01 '16 at 13:02
@paulomoura yes I'm aware the replacement changes things. I believe I indicated that it would yield different results. – lurker Jul 01 '16 at 13:40
You wrote "But then it will limit solutions where the first argument is variable.". But that's just one of the cases where the solutions would differ. To be accurate, solutions may differ if list elements are not ground. – Paulo Moura Jul 01 '16 at 13:52
1

@PauloMoura I agree I did not provide a complete description of the differences. Thanks for pointing them out. – lurker Jul 01 '16 at 13:53

score 1 · Answer 3 · answered Jul 01 '16 at 10:45

An empty list with its duplicates removed is, of course, still an empty list.

If the tail does not start with the head, we should keep the head. The de-duplicated tail is the tail with its duplicates removed.
This includes the case where the tail is empty (and so it is a single-element list).

If the list does start with its head repeated, we keep the head. Then we include the head when removing the other duplicates, since the head may be repeated more than once. If we don't include the head, we can't check the tail against it.

remove_dups([],[]).

remove_dups([H|T], [H|T1]) :-
    T \= [H|_],
    remove_dups(T, T1).

remove_dups([H,H|T], L) :-
    remove_dups([H|T], L).

And here's an example using SWISh.

This is a nice way of consolidating the unique head case with the single element case. — lurker, Jul 01 '16 at 10:52

score 0 · Answer 4 · answered Nov 30 '19 at 20:50

The simplest way in which you can do this is by using takeout function and member function

takeout(X,[X|R],R).
takeout(X,[F|Fs],[F|S]):- takeout(X,Fs,S).
/*takeout function used to remove
 delete given element from the list*/


rem([X],[X]).
rem([H|T],Z):- member(H,T) , takeout(H,[H|T],L) ,!, rem(L,Z).
rem([H|T],[H|Z]):- \+member(H,T) , rem(T,Z).

/* I used cut to stop backtracking of takeout function
 rem([X],[X]). when only one elem present return the same list
 rem([H|T],Z):-member(H,T) , takeout(H,[H|T],L) ,!, rem(L,Z).
 used to takeout the duplicate element from the list.
 rem([H|T],[H|Z]):- \+member(H,T) , rem(T,Z).
 if Head is not present in the tail , do nothing and check for the same in tail.
 */

Removing consecutive duplicates from a list in Prolog

4 Answers4