Remove duplicates from a list while keeping the rightmost occurrences

Question

I am trying to remove duplicates from a list while keeping the rightmost occurrences. E.g.: [1,2,3,1,2] is transformed in [3,1,2] It's one of my first tries in Prolog and I don't understand what am I doing wrong. It always returns false. This is my code:

%nrap(L:list,E:element,S:integer)
%L - the initial list, list of integers
%E - the element, integer
%S - the result, nrap of E in L, S integer
%flow model: (i,i,o),(i,i,i)

nrap([],_,0).
nrap([H|T],E,S):-
    H=E,
    nrap(T,E,S1),
    S is S1+1.
nrap([H|T],E,S):-
    H\=E,
    nrap(T,E,S).

%transform(L:list,L2:list,R:list)
%L - the initial list, list of integers
%L2 - copy of the initial list
%R - the resulted list, without duplicates, list of integers
%flow model: (i,i,o),(i,i,i)

transform([],[],[]).
transform([H|T],L2,[H|R]):-
    nrap(L2,H,S),
    S=1,
    transform(T,L2,R).
transform([H|T],L2,R):-
    nrap(L2,H,S),
    S>1,
    transform(T,L2,R).

Your `transform/3` fails because `L2` (second argument) is never "driven" down to `[]` which your base case assumes. `L2` never changes. Try `transform([], _, []).` as your base case. — lurker, Oct 21 '15 at 18:25
@lurker It does say "in order of last appearance", so at least it is consistent. — , Oct 21 '15 at 18:25
Now that you've changed your base case, you still need to look at `L2` more carefully. You are carrying the *same* `L2` throughout the recursion process. Since you're counting elements in `L2`, the count `S` of elements never changes throughout that process. It will not succeed on the `S = 1` case. — lurker, Oct 21 '15 at 18:58
When writing code to *remove duplicates*, it's a good idea to **always** specify how you test for duplicates, i.e. by using *equality* or by using *unification*, if your list may contain non-ground terms. Can that be the case here? — Paulo Moura, Oct 21 '15 at 22:49

score 3 · Answer 1 · edited May 23 '17 at 11:50

Shall I be pure or impure? Why even consider sacrificing logical-purity if we can save it easily!
Using memberd_t/3 and if_/3, we define list_rset/2 and its left "twin" list_lset/2:

list_rset([], []).                     % keep rightmost occurrences
list_rset([E|Es], Rs0) :-
   if_(memberd_t(E, Es),
       Rs0 = Rs,
       Rs0 = [E|Rs]),
   list_rset(Es, Rs).

list_lset([], []).                     % keep leftmost occurrences
list_lset([E|Es], Ls) :-
   post_pre_lset(Es, [E], Ls).         % uses internal auxilary predicate

post_pre_lset([], _, []).            
post_pre_lset([E|Es], Pre, Ls0) :-     % 2nd arg: look-behind accumulator
   if_(memberd_t(E, Pre),
       Ls0 = Ls,
       Ls0 = [E|Ls]),
   post_pre_lset(Es, [E|Pre], Ls).

Let's run some queries!

?- _Es = [1,2,3,1,2], list_lset(_Es, Ls), list_rset(_Es, Rs).
Ls = [1,2,3], Rs = [3,1,2].           % succeeds deterministically

In above query 1 precedes 2 both at the beginning and at the end of the list [1,2,3,1,2].
What if 1 precedes 2 at the beginning but follows it at the end (e.g., [1,2,3,2,1])?

?-  _Es = [1,2,3,2,1], list_lset(_Es, Ls), list_rset(_Es, Rs).
Ls = [1,2,3], Rs = [3,2,1].          % succeeds deterministically

Next, we look at a more general list_rset/2 goal that uses a list containing variables only.
Thanks to @PauloMoura for his suggestion!

?- Es = [A,B,C,A,B], list_rset(Es,Rs).
   Es = [C,C,C,C,C], Rs = [    C],     A=B ,               B=C
;  Es = [B,B,C,B,B], Rs = [C,  B],     A=B ,           dif(B,C)
;  Es = [C,B,C,C,B], Rs = [  C,B],               A=C , dif(B,C)
;  Es = [A,C,C,A,C], Rs = [  A,C],           dif(A,C),     B=C
;  Es = [A,B,C,A,B], Rs = [C,A,B], dif(A,B), dif(A,C), dif(B,C).

What's up with the residual goals (above)? Without sufficient instantiation, dif/2 is not decidable.
To save logical soundness, the execution of the prolog-dif constraints is delayed.

Last, one more use-case: an "input" list Xs that has both variables and ground terms.

?- Es = [A,B,z], list_rset(Es,Rs).
   Es = [z,z,z], Rs = [    z],     A=B ,               B=z 
;  Es = [B,B,z], Rs = [B,  z],     A=B ,           dif(B,z)
;  Es = [z,B,z], Rs = [  B,z],               A=z , dif(B,z)
;  Es = [A,z,z], Rs = [A,  z],           dif(A,z),     B=z 
;  Es = [A,B,z], Rs = [A,B,z], dif(A,B), dif(A,z), dif(B,z).

Repeat, I would expect nothing less than logical purity from you! (+1) :) — lurker, Oct 21 '15 at 20:34
Given your concern with logical purity, did you tried your solution with a goal such as `list_rset([A,B,C,A,B],Xs).` ? — Paulo Moura, Oct 21 '15 at 22:05
@PauloMoura. Very helpful! I will update the answer tomorrow (well, today, actually) and add the case you gave, and some similar ones that use variables, too. (And I'll try to show how some pure analogues of handy predicates could be based on `list_rset/2` compared to a more direct approach.) Thanks! — repeat, Oct 21 '15 at 23:17

score 1 · Answer 2 · edited May 23 '17 at 11:50

This is a follow-up to this previous answer... In this answer we use dcg!

We build lset//1 upon memberd_t/3 and if_//3—the dcg analogue of if_/3:

lset([]) -->
   [].
lset([X|Xs]) -->
   [X],
   lset_pre(Xs,[X]).

lset_pre([],_) -->
   [].
lset_pre([X|Xs],Pre) -->
   if_(memberd_t(X,Pre), [], [X]),
   lset_pre(Xs,[X|Pre]).

Same for rset//1:

rset([]) -->
   [].
rset([X|Xs]) -->
   if_(memberd_t(X,Xs), [], [X]),
   rset(Xs).

Some sample queries:

?- _Es = [1,2,3,1,2], phrase(lset(_Es),Ls), phrase(rset(_Es),Rs).
Ls = [1,2,3], Rs = [3,1,2].               % succeeds deterministically

?- _Es = [1,2,3,2,1], phrase(lset(_Es),Ls), phrase(rset(_Es),Rs).
Ls = [1,2,3], Rs = [3,2,1].               % succeeds deterministically

score 0 · Answer 3 · 2015-10-22T06:29:21.457

0

This is easier than you are making it. Since the elements in the "set" have to be in the order of last appearance, you don't need to keep a copy of the list at all: just compare to the remainder of the list (the tail).

If you know that the first list is always going to be ground (all elements are integers, for example), you could write:

list_set([], []).
list_set([X|Xs], Ys0) :-
    (   memberchk(X, Xs)
    ->  Ys0 = Ys
    ;   Ys0 = [X|Ys]
    ),
    list_set(Xs, Ys).

memberchk/2 can be used to check if a ground term is in a list of ground terms. It will succeed or fail exactly once.

A more general solution is to pose a constraint that an element should be in the set if it is different from all the elements following it, and be dropped otherwise:

list_set([], []).
list_set([X|Xs], [X|Ys]) :-
    maplist(dif(X), Xs),
    list_set(Xs, Ys).
list_set([X|Xs], Ys) :-
    \+ maplist(dif(X), Xs),
    list_set(Xs, Ys).

Here, maplist(dif(X), Xs) means:

X is different from every element in the list Xs (the tail).

and \+ Goal succeeds then Goal does not succeed.

With this defintion:

?- list_set([1,2,3,1,2], S).
S = [3, 1, 2] ;
false.

?- list_set([1,2,3,3,1,1,2], S).
S = [3, 1, 2] ;
false.

?- list_set([A,B,C,A,B],Xs).
Xs = [C, A, B],
dif(A, B),
dif(C, B),
dif(C, A) ;
false.

edited Oct 22 '15 at 06:29

answered Oct 21 '15 at 19:03

@repeat `member/2` is just as wrong as `memberchk/2`, just not as many times. As it is at the moment is a bit better. – Oct 22 '15 at 03:58
2

Mixing any kind of coroutining with meta-logical predicates is a recipe for losing declarative semantics! Assuming that you use `dif/2` as a declarative alternative to `(\==)/2`, why use `(\+)/1` immediate afterwards and kick declarative semantics out if the window?! Very puzzling, indeed. – repeat Oct 22 '15 at 08:28
@repeat Can you point out a particular case for which `\+ maplist(dif(X), Xs)` does not have declarative semantics? – Oct 22 '15 at 08:34
@repeat Here, my reading of `\+ maplist(dif(X), Xs)` is simply: "It cannot be proven that all elements of `Xs` are different from `X`." – Oct 22 '15 at 08:37
@repeat Yes, that's a good point. However, Prolog conjunctions are not logical conjunctions and no one ever claimed they were. I see your point but I really fail to see the importance. What you are in practice saying is that you don't like the semantics of the Prolog comma. – Oct 22 '15 at 11:21
2

@Boris: Prolog conjunctions **are** logical conjunctions for the pure monotonic subset of Prolog which gave Prolog its name. – false Oct 22 '15 at 11:53
@false Agreed. It is just that constantly worrying about logical pureness in practice feels like trying to program with a straight-jacket on. This is despite the fact that I do think that usually form liberates. – Oct 22 '15 at 12:45
2

@Boris: If purity feels like a straight-jacket to you, what do actual programming errors feel like? – false Oct 22 '15 at 13:34
@false Like all errors: you make them, you learn, you move on, if you can :) – Oct 22 '15 at 13:44
2

I see that when (working with ground terms) your first definition of `list_set/2` makes sense. To safeguard against unsound uses, however, preconditions need to be actively enforced, either by static code analysis or by dynamic term testing like: `groundlist_set(Xs,Ys) :- ( ground(Xs) -> list_to_set_aux(Xs,Ys) ; throw(error(instantiation_error,groundlist_set/2) ).` – repeat Oct 22 '15 at 13:51
@repeat It says quite clearly in my answer that the list needs to be ground to use the first definition. How this is enforced is another question, and does not at all answer OP's question (which, incidentally, claims that the first argument is a "list of integers").... – Oct 22 '15 at 13:53
1

Yes, you did state that. The way I see it, the OP did not ask for implementations but had questions regarding his own code... By *that* measure, both your answer *and* mine are off:) – repeat Oct 22 '15 at 13:58
Have you already noticed the similarity of your answer and mine? It just occurred to me... Look for yourself! `(memberchk(X, Xs) -> Ys0=Ys ; Ys0=[X|Ys])` (taken from your answer) and `if_(memberd_t(X,Xs), Ys0=Ys, Ys0=[X|Ys])` (my answer)... – repeat Oct 22 '15 at 16:21
1

@repeat Yes, I noticed the similarity. However, yours is logically pure while my is not. – Oct 23 '15 at 07:07

Remove duplicates from a list while keeping the rightmost occurrences

3 Answers3