Collecting Items and Counting the Number of Occurrences

Question

I am trying to write a recursive rule collCount/2 which groups identical items in a list with their respective numbers of occurrences into tuples.

For example, collCount([a,b,a,b,c,b],F) binds F with [(a,2),(b,3),(c,1)]. When running this query, Prolog simply returns no.

The following is what I have managed to do so far:

collCount([H|T],[(H,N)|L2]) :-
    countDel(H,[H|T],L,N),
    collCount(L,L2).

countDel(X,T,Rest,N) :-
    occur(X,T,N),
    delAll(X,T,Rest).

occur(_,[],0).
occur(X,[X|T],N) :-
    occur(X,T,NN),
    N is NN + 1.
occur(X,[H|T],N) :-
    occur(X,T,N),
    X \= H.

delAll(_,[],[]).
delAll(X,[X|T],Ans) :-
    delAll(X,T,Ans).
delAll(X,[H|T],[H|Rest]) :-
    delAll(X,T,Rest),
    X \= H.

The predicate countDel/4 counts and deletes all occurrences of a specific item in a list. For instance, countDel(2,[1,2,3,2,2],L,N) binds L with [1,3] and N with 3.

The predicate occur/3 counts all occurrences of a specific item in a list. For instance, occur(3,[1,2,3,4,3],Num) binds Num with 2.

The predicate delAll/3 deletes all occurrences of a specific item in a list. For instance, delAll(3,[1,2,3,4,3],L) binds L with [1,2,4].

Any help is greatly appreciated.

Have you considered sorting the list first, using `msort/2`? It will make the job easier. Outside of that, if you could fill in a brief description (one sentence or two) for each predicate you've defined, explaining their meaning, it could help clarify the issue. — lurker, Jun 27 '14 at 18:32
Yes, I have. However, I am trying to avoid using built-in predicates. I wrote a recursive rule which sorts lists, but it only sorts lists containing numbers. Also, I have added the descriptions for the predicates above. — , Jun 27 '14 at 20:17

false · Answer 1 · 2014-07-16T18:14:38.483

I would like to draw your attention to a little part of your and @CapelliC's solution. Innocuous as:

occur(_,[],0) :- false.
occur(X,[X|T],N) :-
    occur(X,T,NN), false,
    N is NN + 1.
occur(X,[H|T],N) :-
    occur(X,T,N), false,
    X \= H.

So what I did here was to insert some goals false into your program. In this manner this program will now take less inferences than without. Still, there is something remarkable. Consider:

?- length(As,M), M>9, maplist(=(a),As), \+ time(occur(a,As,_)).
% 3,072 inferences, 0.002 CPU in 0.002 seconds (100% CPU, 1989931 Lips)
As = [a, a, a, a, a, a, a, a, a|...],
M = 10 ;
% 6,144 inferences, 0.003 CPU in 0.003 seconds (100% CPU, 2050613 Lips)
As = [a, a, a, a, a, a, a, a, a|...],
M = 11 ;
% 12,288 inferences, 0.006 CPU in 0.006 seconds (100% CPU, 2128433 Lips)
As = [a, a, a, a, a, a, a, a, a|...],
M = 12

Have you seen enough? The number of inferences doubles when adding one further element. Brief, there is exponential overhead! You need to put the goal for inequality first. Even better, use dif(X, H).

Note that this property is independent of being tail-recursive or not. There is still some place for optimizations, but far less than by this one.

See failure-slice for more examples to use this technique.

score 3 · Answer 2 · edited May 23 '17 at 10:28

For a logically pure implementation look at my answer to the related question "How to count number of element occurrences in a list in Prolog".

In that answer I present an implementation of list_counts/2 which preserves logical-purity.

Let's put list_counts/2 to use!

?- list_counts([a,b,a,b,c,b],F).
F = [a-2, b-3, c-1].

Note that list_counts/2 represents pairs of K and V as K-V. In general, this is preferable to representations based on comma (K,V) or lists [K,V] for a number of reasons (readability, interoperability with other standard library predicates, efficiency).

If you really need to use the comma-based notation, you can define collCount/2 as follows:

:- use_module(library(lambda)).

collCount(Xs,Fss) :-
    list_counts(Xs,Css),
    maplist(\ (K-V)^(K,V)^true,Css,Fss).

So let's put collCount/2 to use:

?- collCount([a,b,a,b,c,b],F).
F = [(a,2), (b,3), (c,1)].           % succeeds deterministically

Edit 2015-05-13

Out of curiosity, let's consider the performance aspect that @false referred to in his answer.

The following query roughly corresponds to the one that @false used in his answer. In both, we are interested in the effort required for universal termination:

?- length(As,M), 
   M>9, 
   maplist(=(a),As), 
   time((list_item_subtracted_count0_count(As,a,_,1,_),false ; true)).
% 73 inferences, 0.000 CPU in 0.000 seconds (95% CPU, 1528316 Lips)
As = [a, a, a, a, a, a, a, a, a|...],
M = 10 ;
% 80 inferences, 0.000 CPU in 0.000 seconds (96% CPU, 1261133 Lips)
As = [a, a, a, a, a, a, a, a, a|...],
M = 11 ;
% 87 inferences, 0.000 CPU in 0.000 seconds (96% CPU, 1315034 Lips)
As = [a, a, a, a, a, a, a, a, a|...],
M = 12 ...

score 0 · Accepted Answer · answered Jun 27 '14 at 20:21

0

your code is mostly correct. I've placed comments where I modified it.

collCount([],[]).  % miss base case
collCount([H|T],[(H,N)|L2]) :-
    countDel(H,[H|T],L,N),
    collCount(L,L2).

countDel(X,T,Rest,N) :-
    occur(X,T,N),
    delAll(X,T,Rest).

occur(_,[],0).
occur(X,[X|T],N) :-
    occur(X,T,NN),
    N is NN + 1.
occur(X,[H|T],N) :-
    occur(X,T,N),
    X \= H.

delAll(_,[],[]).
delAll(X,[X|T],Ans) :-
    delAll(X,T,Ans).
delAll(X,[H|T],[H|Rest]) :-
    X \= H,  % moved before recursive call
    delAll(X,T,Rest).

this yields

?- collCount([a,b,a,b,c,b],F).
F = [ (a, 2), (b, 3), (c, 1)] ;
false.

answered Jun 27 '14 at 20:21

CapelliC

59,646
5
47
90

Thanks for the reply. May I ask you why you moved that goal before the recursive call? – Jun 27 '14 at 20:39
I've done it 'automatically', for efficiency. It's useless to recurse the full list and then discard the result... But the program should work anyway. – CapelliC Jun 27 '14 at 20:43

score 0 · Answer 4 · answered Jun 27 '14 at 23:33

One way:

frequencies_of( []     , []     ) .  % empty list? success!
frequencies_of( [X|Xs] , [F|Fs] ) :- % non-empty list?
  count( Xs , X:1 , F , X1 ) ,       % count the occurrences of the head, returning the source list with all X removed
  frequencies_of( X1 , Fs )          % continue
  .                                  %

count( [] , F , F , [] ) .           % empty list? success: we've got a final count.
count( [H|T] , X:N , F , Fs ) :-     % otherwise...
  H = X ,                            % - if the head is what we're counting, 
  N1 is N+1 ,                        % - increment the count
  count( T , X:N1 , F , Fs )         % - recurse down
  .                                  %
count( [H|T] , X:N , F , [H|Fs] ) :- % otherwise...
  H \= X ,                           % - if the head is NOT what we're counting
  count( T , X:N , F , Fs )          % - recurse down, placing the head in the remainder list
  .                                  %

Another way of looking at it:

frequencies_of( Xs , Fs ) :-     % compile a frequency table
  frequencies_of( Xs , [] , Fs ) % by invoking a worker predicate with its accumulator seeded with the empty list
  .

frequencies_of( []     , Fs , Fs ) .  % the worker predicate ends when the source list is exhausted
frequencies_of( [X|Xs] , Ts , Fs ) :- % otherwise...
  count( X , Ts , T1 ) ,              % - count X in the accumulator
  frequencies_of( Xs , T1 , Fs )      % - and recursively continue
  .                                   %

count( X , []       , [X:1]     ) .   % if we get to the end, we have a new X: count it
count( X , [X:N|Ts] , [X:N1|Ts] ) :-  % otherwise, if we found X,
  N1 is N+1                           % - increment the count
  .                                   % - and end.
count( X , [T:N|Ts] , [T:N|Fs]  ) :-  % otherwise
  X \= T ,                            % - assuming we didn't find X
  increment( X , Ts , Fs )            % - just continue looking
  .                                   % Easy!

A third approach would be to sort the list first, without removing duplicates. Once the list is ordered, a simple 1-pass, run-length encoding of the ordered list gives you the frequency table, something like this:

frequencies_of( Xs , Fs ) :- % to compute the frequencies of list elements
  msort( Xs , Ts ) ,         % - sort the list (without removing duplicates)
  rle( Ts , Fs )             % - then run-length encode the sorted list
  .                          % Easy!

rle( []    , [] ) .    % the run length encoding of an empty list is the empty list.
rle( [H|T] , Rs ) :-   % the run length encoding is of a non-empty list is found by
  rle( T , H:1 , Rs )  % invoking the worker on the tail with the accumulator seeded with the head
  .                    %

rle( []    , X:N , [X:N] ) .     % the end of the source list ends the current run (and the encoding).
rle( [H|T] , X:N , Rs    ) :-    % otherwise...
  H = X     ,                    % - if we have a continuation of the run,
  N1 is N+1 ,                    % - increment the count
  rle( T , X:N1 , Rs )           % - and recursively continue
  .                              %
rle( [H|T] , X:N , [X:N|Rs] ) :- % otherwise...
  H \= X ,                       % - if the run is at an end,
  rle( T , H:1 , Rs)             % - recursively continue, starting a new run and placing the current encoding in the result list.
  .                              %

Collecting Items and Counting the Number of Occurrences

4 Answers4

Edit 2015-05-13