Include non-unique elements only

Question

This question was asked but there are no answers: here. I read the comments and tried to implement in both ways, but there are more problems that I don't understand.

I first tried the easy way that doesn't keep original order:

list_repeated(L, Ds) :-
    msort(L, S),
    sorted_repeated(S, Ds).

sorted_repeated([], []).
sorted_repeated([X|Xs], Ds) :-
    first(Xs, X, Ds).

first([], _, []).
first([X|Xs], X, [X|Ds]) :-
    more(Xs, X, Ds).
first([X|Xs], Y, Ds) :-
    dif(X, Y),
    first(Xs, X, Ds).

more([], _, []).
more([X|Xs], X, Ds) :-
    more(Xs, X, Ds).
more([X|Xs], Y, Ds) :-
    dif(X, Y),
    first(Xs, X, Ds).

Once the list is sorted without removing duplicates, using first and more I add the element to the second argument if it occurs at least twice and skip all consecutive copies of the element.

This is not working properly because if I have:

?- list_duplicates([b,a,a,a,b,b], Ds).

I get answer [a,b] instead of [b,a] and also I get ; false after the answer.

I also tried another way, but this doesn't work because the accumulator is immutable?

list_duplicates(L, Ds) :-
    ld_acc(L, [], Ds).

ld_acc([], _, []).
ld_acc([X|Xs], Acc, Ds) :-
    (   memberchk(X, Acc)
    ->  Ds = [X|Ds0],
        ld_acc(Xs, Acc, Ds0)
    ;   Acc1 = [X|Acc],
        ld_acc(Xs, Acc1, Ds)
    ).

This cannot work because when I check that an element is member of accumulator I remove only one occurrence of each element: if I have three times the same element in the first argument, I am left with two. If I could change the element in the accumulator then I could maybe put a counter on it? In the first version I used different states, first and more, but here I have to attach state to the elements of the accumulator, is that possible?

Wait... you get `[a,b]` instead of `[a,b]`? Not sure I follow. The `; false` means Prolog first succeeded with an answer `[a,b]`, then you typed `;` (or space) asking Prolog to find additional solutions. It found no additional solutions, so it said `false`. — lurker, Feb 01 '17 at 20:32
Can you explain what `first(..)` and `more(..)`, etc. are doing. Furthermore all variables in Prolog are basically *immutable*. The only thing that can happen is that they are further *grounded*. — Willem Van Onsem, Feb 01 '17 at 21:13
Hello @lurker, I get `[a,b]` instead of `[b,a]`. Someone already corrected my question. I think it is my error that Prolog is asking for additional solutions when I think that there are no additional solutions. — , Feb 02 '17 at 10:48
@WillemVanOnsem The question title is probably not good. It says "include duplicates" but it means "second argument is a list of unique elements that occur more than once in the first argument". I think some of the answers are also confused but I am not sure. — , Feb 02 '17 at 10:50
@User9213: not that I want to insult you or anything. But perhaps in the (near) future you better reread your question before posting. If you include (complex) code fragments describe briefly what the functions are supposed to do. *please* :) — Willem Van Onsem, Feb 02 '17 at 11:05
@WillemVanOnsem I reread the question and there were some mistakes left after that. I thought it was clear from the question what the code has to do, but I was wrong about this, too. It is not easy to explain with words what code does because I am not very good at writing English. — , Feb 02 '17 at 11:07
@User9213: No, I understand :) don't worry. But well formatted questions tend to be answered faster and with answers that are more to the point so you will benefit from it as well :) — Willem Van Onsem, Feb 02 '17 at 11:12
@User9213: Such changes as you now did, are better added separately. You are invalidating the existing answers. — false, Feb 02 '17 at 11:20
@false I see but the original question contained code that solved the problem as it was meant, not as it was understood, correctly already. So to not invalidate the answers I would have to remove the whole first code example completely. — , Feb 02 '17 at 11:22
I don't think that the question as originally worded was too bad, FWIW. My answer meets the new requirements just as well as the old, even if it is not a textbook on DCGs :P — Jim Ashworth, Feb 02 '17 at 13:14

score 2 · Answer 1 · edited May 23 '17 at 12:25

A plea for purity

When programming in Prolog, a major attraction is the generality we enjoy from pure relations.

This lets us use our code in multiple directions, and reason declaratively over our programs and answers.

You can enjoy these benefits if you keep your programs pure.

Possible solution

As always when describing lists, also consider using DCG notation. See dcg for more information.

For example, to describe the list of duplicates in a pure way, consider:

list_duplicates([]) --> [].
list_duplicates([L|Ls]) -->
        list_duplicates_(Ls, L),
        list_duplicates(Ls).

list_duplicates_([], _) --> [].
list_duplicates_([L0|Ls], L) -->
        if_(L0=L, [L], []),
        list_duplicates_(Ls, L).

This uses if_//3 to retain generality and determinism (if applicable).

Examples

Here are a few example queries and answers. We start with simple ground cases:

?- phrase(list_duplicates([a,b,c]), Ds).
Ds = [].

?- phrase(list_duplicates([a,b,a]), Ds).
Ds = [a].

Even the most impure version will be able to handle these situations correctly. So, slightly more interesting:

?- phrase(list_duplicates([a,b,X]), Ds).
X = a,
Ds = [a] ;
X = b,
Ds = [b] ;
Ds = [],
dif(X, b),
dif(X, a).

Pretty nice, isn't it? The last part says: Ds = [] is a solution if X is different from b and a. Note the pure relation dif/2 automatically appears in these residual goals and retains the relation's generality.

Here is an example with two variables:

?- phrase(list_duplicates([X,Y]), Ds).
X = Y,
Ds = [Y] ;
Ds = [],
dif(Y, X).

Finally, consider using iterative deepening to fairly enumerate answers for lists of arbitrary length:

?- length(Ls, _), phrase(list_duplicates(Ls), Ds).
Ls = Ds, Ds = [] ;
Ls = [_136],
Ds = [] ;
Ls = [_136, _136],
Ds = [_136] ;
Ls = [_156, _162],
Ds = [],
dif(_162, _156) ;
Ls = Ds, Ds = [_42, _42, _42] ;
Ls = [_174, _174, _186],
Ds = [_174],
dif(_186, _174) .

Multiple occurrences

Here is a version that handles arbitrary many occurrences of the same element in such a way that exactly a single occurrence is retained if (and only if) the element occurs at least twice:

list_duplicates(Ls, Ds) :-
        phrase(list_duplicates(Ls, []), Ds).

list_duplicates([], _) --> [].
list_duplicates([L|Ls], Ds0) -->
        list_duplicates_(Ls, L, Ds0, Ds),
        list_duplicates(Ls, Ds).

list_duplicates_([], _, Ds, Ds) --> [].
list_duplicates_([L0|Ls], L, Ds0, Ds) -->
        if_(L0=L, new_duplicate(L0, Ds0, Ds1), {Ds0 = Ds1}),
        list_duplicates_(Ls, L, Ds1, Ds).

new_duplicate(E, Ds0, Ds) -->
        new_duplicate_(Ds0, E, Ds0, Ds).

new_duplicate_([], E, Ds0, [E|Ds0]) --> [E].
new_duplicate_([L|Ls], E, Ds0, Ds)  -->
        if_(L=E,
            { Ds0 = Ds },
            new_duplicate_(Ls, E, Ds0, Ds)).

The query shown by @fatalize in the comments now yields:

?- list_duplicates([a,a,a], Ls).
Ls = [a].

The other examples yield the same results. For instance:

?- list_duplicates([a,b,c], Ds).
Ds = [].

?- list_duplicates([a,b,a], Ds).
Ds = [a].

?- list_duplicates([a,b,X], Ds).
X = a,
Ds = [a] ;
X = b,
Ds = [b] ;
Ds = [],
dif(X, b),
dif(X, a).

?- list_duplicates([X,Y], Ds).
X = Y,
Ds = [Y] ;
Ds = [],
dif(Y, X).

I leave the case ?- list_duplicates(Ls, Ls). as an exercise.

Generality: Multiple directions

Ideally, we want to be able to use a relation in all directions.

For example, our program should be able to answer questions like:

What does a list look like if its duplicates are [a,b]?

With the version shown above, we get:

?- list_duplicates(Ls, [a,b]).
nontermination

Luckily, a very simple change allows as to answer such questions!

One such change is to simply write:

list_duplicates(Ls, Ds) :-
        length(Ls, _),
        phrase(list_duplicates(Ls, []), Ds).

This is obviously declaratively admissible, because Ls must be a list. Operationally, this helps us to enumerate lists in a fair way.

We now get:

?- list_duplicates(Ls, [a,b]).
Ls = [a, a, b, b] ;
Ls = [a, b, a, b] ;
Ls = [a, b, b, a] ;
Ls = [a, a, a, b, b] ;
Ls = [a, a, b, a, b] ;
Ls = [a, a, b, b, a] ;
Ls = [a, a, b, b, b] ;
Ls = [a, a, b, b, _4632],
dif(_4632, b),
dif(_4632, a) ;
etc.

Here is a simpler case, using only a single element:

?- list_duplicates(Ls, [a]).
Ls = [a, a] ;
Ls = [a, a, a] ;
Ls = [a, a, _3818],
dif(_3818, a) ;
Ls = [a, _3870, a],
dif(_3870, a) ;
Ls = [_4058, a, a],
dif(a, _4058),
dif(a, _4058) ;
Ls = [a, a, a, a] ;
etc.

Maybe even more interesting:

What does a list without duplicates look like?

Our program answers:

?- list_duplicates(Ls, []).
Ls = [] ;
Ls = [_3240] ;
Ls = [_3758, _3764],
dif(_3764, _3758) ;
Ls = [_4164, _4170, _4176],
dif(_4176, _4164),
dif(_4176, _4170),
dif(_4170, _4164) .

Thus, the special case of a list where all elements are distinct naturally exists as a special case of the more general relation we have implemented.

We can use this relation to generate answers (as shown above), and also to test whether a list consists of distinct elements:

?- list_duplicates([a,b,c], []).
true.

?- list_duplicates([b,b], []).
false.

Unfortunately, the following even more general query still yields:

?- list_duplicates([b,b|_], []).
nontermination

On the plus side, if the length of the list is fixed, we get in such cases:

?- length(Ls, L), maplist(=(b), Ls),
   ( true ; list_duplicates(Ls, []) ).
Ls = [],
L = 0 ;
Ls = [],
L = 0 ;
Ls = [b],
L = 1 ;
Ls = [b],
L = 1 ;
Ls = [b, b],
L = 2 ;
Ls = [b, b, b],
L = 3 ;
Ls = [b, b, b, b],
L = 4 .

This is some indication that the program indeed terminates in such cases. Note that the answers are of course now too general.

Efficiency

It is well known in high-performance computing circles that as long as your program is fast enough, its correctness is barely worth considering.

So, the key question is of course: How can we make this faster?

I leave this is a very easy exercise. One way to make this faster in specific cases is to first check whether the given list is sufficiently instantiated. In that case, you can apply an ad hoc solution that fails terribly in more general cases, but has the extreme benefit that it is fast!

@fatalize: Thank you, I have added a version that only retains a single occurrence of the element in such cases. — mat, Feb 02 '17 at 08:55
If you get blank + `.` at this point, it means *you* have potentially cut away solutions (by pressing `RET` instead of asking for more solutions with `SPACE`). The blank space before the `.` shows that more solutions *may* be available. The program itself *does not terminate* in this case. There is a crucial difference between nontermination and incorrectly omitting solutions. You can always test for universal termination by appending `false` to a query. For example, try: `?- list_duplicates(Ls, [a]), Ls = [_,_,_], false.`. If you add `Ls=[_,_,_]` in front of the query, it terminates. — mat, Feb 02 '17 at 09:15
True, I wasn't careful there. It still remains that a simple query like `list_duplicates(L,[a,b]).` goes into an infinite loop, and that `list_duplicates(L,[a]).` does not enumerate choice points in an ideal fashion (both problems are most likely linked). — Fatalize, Feb 02 '17 at 09:22
That is rather easy to fix: Simply add `length(Ls, _)` to `list_duplicates/2`, at an appropriate place. — mat, Feb 02 '17 at 09:24
@Fatalize: Both your queries contain infinitely many answers as solutions and thus **must not terminate**. Now, insisting that a non-terminating query has to produce answers in a fair manner is desirable, but it is a bit of an unnecessary luxury - since it does not improve termination. — false, Feb 02 '17 at 10:10
@false I wouldn't say it's a luxury since e.g. `list_duplicates(L,[a]), L = [a,a,b].` goes into an infinite loop while `L = [a,a,b], list_duplicates(L,[a]).` works properly. One would at least expect the first one to give an answer once. — Fatalize, Feb 02 '17 at 10:35
@Fatalize: Your requirements are overarching. You cannot guarantee fair enumeration for any goal - that's what your requirement would be in ultimo fine. — false, Feb 02 '17 at 10:42
@Fatalize: I have added more information on how to obtain a version that indeed succeeds once in this case, please have a look. — mat, Feb 02 '17 at 10:55
@Fatalize: worse, you cannot even realistically expect that termination is maintained! Think of `list_duplicates(_,[X,X|_])` or `list_duplicates(L,[_|L])` — false, Feb 02 '17 at 10:58
Hello @mat, thank you for answering. I probably did not explain my problem with the right words. The second argument has to include exactly one occurrence of each element in the first argument that appears more than once. — , Feb 02 '17 at 11:02
@mat Or in other words, each element that occurs more than once in the first argument appears exactly once in the second argument, in the same order as the first occurrences of each item in the first argument. I tried your code but it is correct for the wrongly written question, only elements that occur exactly twice. — , Feb 02 '17 at 11:14
Please try the other versions I posted. They work like you describe and you should be able to use them in your case with at most small adaptations. — mat, Feb 02 '17 at 11:18
@false Then I don't see why you often preach about correctness of predicates on SO, when the behavior for `list_duplicates(_,[X,X|_])` or `list_duplicates(L,[_|L])` are obviously incorrect but don't seem to bother you. — Fatalize, Feb 02 '17 at 13:02
@Fatalize: We are using words like "correctness" in quite a different meaning. To me, a definition is correct if, failing goals actually are not true, and (unconditional) succeeding goals are true. But you seem to expect the opposite direction holds as well. That is, practically, quite too demanding. That is: You will rarely ever find such a definition at all. In particular if that definition is itself a pure one. (You always could resort to some constraint-like definition, which however, is much more complex to define - and much more prone to errors.) — false, Feb 02 '17 at 13:11

score 0 · Answer 2 · answered Feb 01 '17 at 22:15

So as far as I can tell, you were on the right track with the accumulator, but this implementation definitely works as you want (assuming you want the duplicates in the order they first appear in the list).

list_duplicates(Input,Output) is just used to wrap and initialise the accumulator.

list_duplicates(Accumulator,[],Accumulator) unifies the accumulator with the output when we have finished processing the input list.

list_duplicates(Accumulator,[H|T],Output) says "if the head (H) of the input list is in the rest of the list (T), and is not in the Accumulator already, put it at the end of the Accumulator (using append), then recurse on the tail of the list".

list_duplicates(Accumulator,[_|T],Output) (which we only get to if either the head is not a duplicate, or is already in the Accumulator) just recurses on the tail of the list.

list_duplicates(Input,Output) :-
    once(list_duplicates([],Input,Output)).

list_duplicates(Accumulator,[],Accumulator).
list_duplicates(Accumulator,[H|T],Output) :-
    member(H,T),
    \+member(H,Accumulator),
    append(Accumulator,[H],NewAccumulator),
    list_duplicates(NewAccumulator,T,Output).
list_duplicates(Accumulator,[_|T],Output) :-
    list_duplicates(Accumulator,T,Output).

You could also recurse in list_duplicates(Accumulator,[H|T],Output) with list_duplicates([H|Accumulator],T,Output) and reverse in the wrapper, looking like this:

list_duplicates(Input,Output) :-
    once(list_duplicates([],Input,ReverseOutput)),
    reverse(ReverseOutput,Output).

list_duplicates(Accumulator,[],Accumulator).
list_duplicates(Accumulator,[H|T],Output) :-
    member(H,T),
    \+member(H,Accumulator),
    list_duplicates([H|Accumulator],T,Output).
list_duplicates(Accumulator,[_|T],Output) :-
    list_duplicates(Accumulator,T,Output).

The once call in the wrapper prevents the false output (or in this case, partial duplicate lists due to a lack of guards on the second rule).