Prolog: splitting a list into two lists (unique items / duplicate items)

Question

I have been trying to split a given list into two different lists: Unique and Duplicate. For example, if we have the list [1, 1, 2, 3, 3, 4, 5] I want the Unique list to be [2, 4, 5] and Duplicate to be [1, 3]. I don't want all the 1's in the list to be in the Duplicate list. I just need one of it. The code I have right now:

compareL([_|[]], Unique, Dup).    
compareL([X3,Y3 | Tail], [X3 | Unique], Dup) :-
    X3 =\= Y3,
    compareL([Y3 | Tail], Unique, Dup). 
compareL([X3,Y3 | Tail], Unique, [X3 | Dup]) :- 
    X3 = Y3,
    skipDups(X3, Tail, Unique, Dup).

skipDups(_, [], Unique, Dup).   
skipDups(X3,[Y3 | Tail], Unique, Dup) :- 
    X3 =\= Y3,
    compareL([Y3 | Tail], Unique, Dup).
skipDups(X3,[Y3 | Tail], Unique, Dup) :-
    X3 = Y3,
    skipDups(X3, Tail, Unique, Dup).

Using the example list given above if I run compareL([1, 1, 2, 3, 3, 4, 5], Unique, Dup). I get:

Unique = [2, 4|_G1954],
Dup = [1, 3|_G1948].

I can't figure out why towards the end of both lists I am getting '_G1954' and '_G1948'. Any help would be appreciated. Thanks.

instead of `compareL([_|[]], Unique, Dup).` try `compareL([_], [], []).` — CapelliC, Apr 06 '13 at 15:57
Thanks. That got rid of the '_G1954' and '_G1948'. But when I have two 5's at the end of the list it comes back again. Any idea why? — saviok, Apr 06 '13 at 16:07
I think your predicates are far too much complicated...I'll post an answer with alternative code. — CapelliC, Apr 06 '13 at 16:09

score 5 · Answer 1 · edited May 23 '17 at 12:23

We can preserve logical-purity by building upon if_/3, (=)/3, and tpartition/4!

list_uniqs_dups([],[],[]).
list_uniqs_dups([X|Xs0],Us0,Ds0) :-
    tpartition(=(X),Xs0,Es,Xs),
    if_(Es=[],
        Us0+Ds0=[X|Us]+Ds,
        Ds0+Us0=[X|Ds]+Us),
    list_uniqs_dups(Xs,Us,Ds).

Here's the query the OP gave:

?- list_uniqs_dups([1,1,2,3,3,4,5],Us,Ds).
Ds = [1,3], Us = [2,4,5].                   % succeeds deterministically

OK! How about the following quite general queries?

?- list_uniqs_dups([],Us,Ds).
Ds = [], Us = [].

?- list_uniqs_dups([A],Us,Ds).
Ds = [], Us = [A].

?- list_uniqs_dups([A,B],Us,Ds).
  Ds = [B], Us = []   ,     A=B
; Ds = [] , Us = [A,B], dif(A,B).

?- list_uniqs_dups([A,B,C],Us,Ds).
  Ds = [C], Us = []     ,     A=B ,               B=C
; Ds = [B], Us = [C]    ,     A=B ,           dif(B,C)
; Ds = [C], Us = [B]    ,               A=C , dif(B,C)
; Ds = [C], Us = [A]    ,           dif(A,C),     B=C
; Ds = [] , Us = [A,B,C], dif(A,B), dif(A,C), dif(B,C).

We need a general indentation rule for `if_/3` going over several lines. — false, Oct 10 '15 at 12:37

score 5 · Answer 2 · edited May 23 '17 at 12:29

5

Answer using (=)/3

list_uniqs_alldups(Es,Us,Ds) :
   tpartition(list_uniqmember_t(Es),Es,Us,Ds).

list_uniqmember_t(Es,X,T) :-
   tfilter(=(X),Es,Xs),
   =(Xs,[X],T).

Sample queries:

?- list_uniqs_alldups([1,1,2,1,1,3,4,3],Us,Ds).
Us = [2, 4],
Ds = [1, 1, 1, 1, 3, 3].

?- list_uniqs_alldups([1,1,2,3,3,1,1,3,4,3],Us,Ds).
Us = [2, 4],
Ds = [1, 1, 3, 3, 1, 1, 3, 3].

?- list_uniqs_alldups([8,1,1,2,3,3,1,1,3,4,3],Us,Ds).
Us = [8, 2, 4],
Ds = [1, 1, 3, 3, 1, 1, 3, 3].

?- list_uniqs_alldups(X,Us,Ds).
X = Us, Us = Ds,   Ds = [] ;
X = Us, Us = [_A], Ds = [] ;
X = Ds, Us = [],   Ds = [_A,_A] ;
X = Ds, Us = [],   Ds = [_A,_A,_A] ;
...

For run length encoding I used splitlistIfAdj/3.

list_rle(List,Rle) :-
  splitlistIfAdj(dif,List,Rle0),
  maplist(rle_length,Rle0,Rle).

rle_length([H|T],RleLen) :-
  length([H|T],L),
  RleLen = L*H.

Query:

?- list_rle([a,a,b,a,a,c,d,c],X).
X = [2*a, 1*b, 2*a, 1*c, 1*d, 1*c].

I am not sure how to change this so that it works in both directions.

You could then also do:

new_list_uniqs_alldups(List,Us,Ds):-
   list_rle(List,Rle),
   tpartition(singlelist_t,Rle,Us,Ds).

singlelist_t(L,T) :-
   L=N*_,
   if_(N=1,T=true,T=false).

Sample Q:

 ?- new_list_uniqs_alldups([1,1,2,1,1,3,4,1,1,7,8],U,D).
 U = [1*2, 1*3, 1*4, 1*7, 1*8],
 D = [2*1, 2*1, 2*1].

 ?- new_list_uniqs_alldups([7,7,7,2,1,1,3,4,1,1,7,8],U,D).
 U = [1*2, 1*3, 1*4, 1*7, 1*8],
 D = [3*7, 2*1, 2*1].

edited May 23 '17 at 12:29

Community

1
1

answered Oct 09 '15 at 19:31

user27815

4,767
14
28

I'm not sure if it would be better to have a separator for each block of duplicates.. so `?- list_uniqs_alldups([1,1,2,1,1,3,4,3],Us,Ds). Us = [2, 4], Ds = [1, 1, |,1, 1, 3, 3] ;` for example. – user27815 Oct 09 '15 at 20:08
May be useful in come cases, but can cause problems if the separator item (say `'|'`) can also occur in the original list. A run-length encoding with pairs `Multiplicity*Term` would avoid all these potential problems, but then why not include unique items as well? Something like `?- list_rle([a,a,b,a,a,c,d,c],[2*a,1*b,2*a,1*c,1*d,1*c]).` – repeat Oct 09 '15 at 20:27
1

Nice shot! However, I would want to see an implementation that makes all three queries succeed deterministically without meta-logical stuff ruining the logical meaning (when used with very general terms). – repeat Oct 09 '15 at 20:34
What are the three queries you want to see? – user27815 Oct 10 '15 at 09:16
1

Those 3 queries are ok! If you look closely at the answer sequence that you get with a [tag:prolog-toplevel] like that of [tag:swi-prolog], you will see they all look like `Us = ..., Ds = ... ; false.`. That trailing `; false.` indicates that Prolog doesn't know if additional answers exist or not. Now, **we** know in these cases there are **none**. I postulate that the three sample goals are sufficiently instantiated, so Prolog can know that too if we use a suitable predicate definition. The answers should look like `Us = [...], Ds = [...].`! OTOH it should still be logically pure. Clearer? – repeat Oct 10 '15 at 10:12
1

Consider the query `?- list_uniqmember_t([1,2],X,T).`. Your previous version correctly answered: `X = 1,T = true ; X = 2,T = true ; T = false,dif(X, 2),dif(X, 1)`. The answer by the "new" version is incomplete `X = 1,T = true`. – repeat Oct 10 '15 at 11:00
2

The **cut**-like Prolog constructs `(!)/0`, `(->)/2`, and `(*->)/2` **easily break code**! I hate to have to tell you, but you just broke your code... – repeat Oct 10 '15 at 11:03
1

Check out `list_RLE/2` in this answer! http://stackoverflow.com/a/33180378/4609915 – repeat Oct 16 '15 at 22:20

score 3 · Accepted Answer · answered Apr 06 '13 at 16:12

3

here is a solution, the key is take/4 that consumes all matching leading items, thus enabling easy testing of the list ( [_|_] matches any list of at least 1 element )

compareL([], [], []).
compareL([X|Xs], U, D) :-
    (   take(X, Xs, [_|_], Ys)
    ->  compareL(Ys, U, B), D = [X|B]
    ;   compareL(Xs, A, D), U = [X|A]
    ).

take(X, [X|Xs], [X|R], Ys) :-
    !, take(X, Xs, R, Ys).
take(_, Ys, [], Ys).

answered Apr 06 '13 at 16:12

CapelliC

59,646
5
47
90

Is it possible without using -> ? – saviok Apr 06 '13 at 16:57
it's possible, but clumsy: `compareL([X|Xs], U, D) :- take(X, Xs, [_|_], Ys), !, compareL(Ys, U, B), D = [X|B] ; compareL(Xs, A, D), U = [X|A].` (untested) – CapelliC Apr 06 '13 at 19:14

score 3 · Answer 4 · answered Apr 06 '13 at 16:34

You can write that :

split_seq([], [], []).

split_seq([H | T], L1_out, L2_out) :-
    split_seq(T, L1, L2),
    (   select(H, L1, L1_out)
    ->  (   member(H, L2)
        ->  L2_out = L2
        ;   L2_out = [H | L2])
    ;   L1_out = [H | L1],
        L2_out = L2).

Prolog: splitting a list into two lists (unique items / duplicate items)

4 Answers4

Linked