Counting how many time a list is sublist of another one

Question

I've got a Prolog exercise that requires me to count how many time a list L2 appears as a sublist in L1.

What I need to do is to write the Prolog code starting from the code written in another language (some kind of didactic puropose language that we use in my university).

So I wrote the algorithm is this language and everything works smooth, but the Prolog version fails and I don't undertand why.

The idea is quite simple:

Get the length of L2 and call it SizeL2
Starting from the head of L1, get each sublist of size SizeL2.
Check if subList(L1, SizeL2)==L2 and add 1 if true
When the end of L1 is reached, we return count

This means something like:

L1=[1,2,3,4,5,2,3,2,3,4], L2=[2,3,4], SizeL2=3, count=0

Step 1 -> [1,2,3]

Step 2 -> [2,3,4] - true, count+1

Step 3 -> [3,4,5]

Step 4 -> [4,5,2]

Step 5 -> [5,2,3]

Step 6 -> [2,3,2]

Step 7 -> [3,2,3]

Step 8 -> [2,3,4] - true, count+1

Step 9 -> L1 is finished, return 2

Now, this is how I translated it to Prolog.

Please note that I've got to avoid any kind of system predicate!

%% we get the size of a list
size([], 0):-!.
size([H|T], R):- size(T,R1), R is R1+1.

%% we get the sublist of the given size
subList(_, 0, []):-!. %% if size is 0, we return an empty list
subList([], _, []):-!. %% if the list is empty, we're done
subList([H|T], Size, [H|Tr]):- Size1 is Size-1, 
                               subList(T, Size1, Tr).

%% we count how many times L2 is sublist of L1  
countSublist([], _, 0):-!. %% if L1 is empty we return 0
countSublist(L1, L2, 0):- size(L1, S1),
                         size(L2, S2),
                         S1 < S2. %% if L1 is shorter than L2, we return 0

countSublist([H|T], L2, R):- size(L2, SizeL2), %% we need L2's size
                             subList([H|T], SizeL2, SubL1), %% we get L1's sublist
                             countSublist(T, L2, R),
                             SubL1 = L2,
                             R1 is R+1. %% we do R+1 only if L2=Sublist of L1

The translation from our language is quite straight-forward and I'm sure that I did it correctly, but I still don't understand why the Prolog versione doesn't work.

I think that the problem should be in the last predicate (countSublist([H|T], L2, R)) because size and subList are working fine.

Any idea?

How many times is `[2,2,2]` a sublist of `[2,2,2,2,2,2]`? Would it be 4 or 2? — lurker, Jan 28 '14 at 11:59
Actually I don't know which one's right, but I guess that's 4. Letting return 2 is quite simple though, but it's not what I need now. Let's say that 4 is correct, why the `Prolog` code fails? — StepTNT, Jan 28 '14 at 12:13

CapelliC · Answer 1 · 2014-01-28T20:12:59.807

Your solution would be simpler, if you could avoid computing size/2 etc.

BTW note that you are not avoiding system predicates in your solution, since is/2 it's a 'extralogical' system predicate, as well as (<)/2.

If you must avoid such system predicates, you must use Peano arithmetic (that is, represent - for instance - 2 as s(s(0))), and rewrite in simpler terms:

countSublist([], Sought, 0).
countSublist([H|List], Sought, s(C)) :-
  isprefix(Sought, [H|List]),
  % !, can you use cuts ?
  countSublist(List, Sought, C).
countSublist([H|List], Sought, C) :-
  % \+ isprefix(Sought, [H|List]), can you use not ?
  countSublist(List, Sought, C).

% isprefix(Sought, List) is simple enough

BTW, it's much simpler using some high level library, like aggregate:

Here is a DCG version

countSublist(List, SubL, Count) :-
    aggregate(count, R^phrase((..., SubL), List, R), Count).

... --> [] ; [_], ... .

and here an append/3 one

countSublist(List, SubL, Count) :-
    aggregate(count, U1^U2^U3^(append(U1,U2,List),append(SubL,U3,U2)), Count).

and, my preferred, using append/2

countSublist(List, SubL, Count) :-
    aggregate(count, L^R^append([L,SubL,R], List), Count).

+1 for persistence in the face of adversity (the OPs requirement to avoid built-ins). That last append version shouldn't have got me as excited as it did. — magus, Jan 28 '14 at 20:52

lurker · Accepted Answer · 2014-01-28T13:32:11.197

I guess everyone has their own simplified version. Mine's only slightly different with a tail-recursion:

% headsub(S, L) is true if S is a sublist at the head of L
headsub([Hs|Ts], [Hs|Tl]) :-
    headsub(Ts, Tl).
headsub([], _).

subcount(L, S, C) :-
    subcount(L, S, 0, C).
subcount([Hl|Tl], S, A, C) :-
    (   headsub(S, [Hl|Tl])
    ->  A1 is A + 1     % if S is a sublist in front, count it
    ;   A1 = A          % otherwise, don't count it
    ),
    subcount(Tl, S, A1, C).   % Count the rest, adding it into the accumulator
subcount([], _, C, C).  % We're done here. Accumulator becomes the answer.

In your original solution, one philosophical issue is your approach to the problem. The more complex algorithm you have comes from thinking of the problem procedurally as in a traditional programming language rather than declaratively, which is what Prolog is designed for. The simpler code comes from thinking of the problem as defining a predicate that indicates when a sublist is at the front of a list. Then defining a predicate that recursively checks whether the sublist is at the front of subsequent tails of that list and counting how many times it's true.

As far as specific bugs in the original solution, there are two issues right off, both with the following predicate:

countSublist([H|T], L2, R):- size(L2, SizeL2), %% we need L2's size
                             subList([H|T], SizeL2, SubL1), %% we get L1's sublist
                             countSublist(T, L2, R),
                             SubL1 = L2,
                             R1 is R+1. %% we do R+1 only if L2=Sublist of L1

Here, you want your final count result to be R. However, you are using R as your intermediate result and then using R1 as your final result. These are reverse of what they should be.

Secondly, SubL1 = L2 fails, then the above clause fails and backtracks to re-evaluate countSublist(T, L2, R). This is probably not what you want since a success of that query should be counted in your result. So that logic needs to be rethought. A quick fix shows the basic need, but needs a little clean-up:

countSublist([H|T], L2, R):- size(L2, SizeL2), %% we need L2's size
                             subList([H|T], SizeL2, SubL1), %% we get L1's sublist
                             countSublist(T, L2, R1),
                             (   SubL1 = L2
                             ->  R is R1+1  %% we do R+1 only if L2=Sublist of L1
                             ;   R = R1
                             ).

This has a success leg if SubL1 = L2 fails which simply doesn't count the mismatch. So now you get the right answer, but multiple times:

| ?- countSublist([1,2,3,4,5,2,3,2,3,4], [2,3,4], R).

R = 2 ? a

R = 2

R = 2

I'll leave it as an exercise for the reader to tidy it up. :)

I guess that I totally forgot to deal with the case in which `SubL1 = L2` evaluates to `No`. I've switched between `R1` and `R` and added a new predicate which takes care of `not(SubL1 = L2)` and now everything's fine. I've not used your `( expr -> instruction1 ; instruction2 )` structure because I've not studied it yet, but now it works! — StepTNT, Jan 28 '14 at 13:17
@StepTNT that's great. The `expr -> expr1 ; expr2` construct prolog acts as an if-then-else and is quite handy. It often simplifies the logic and avoids redundantly checking a condition (*e.g.*, avoiding `foo(...) :- expr, expr1. foo(...) :- \+ expr, expr2.`). Sometimes the alternatives aren't nearly as practical. — lurker, Jan 28 '14 at 13:21

score 2 · Answer 3 · edited May 23 '17 at 11:59

This answer is a follow-up to this earlier answer and tries to better it in different ways:

:- use_module(library(clpfd)).    % for declarative integer arithmetic

We minimize clpfd overhead by using an accumulator—like this fd_length/2.
Also, we handle some corner-case(s) in a more consistent manner.

Let's get it started! Building upon prefix_of_t/3 we define:

list_contains_count(List, Sub, N) :-
   list_contains_count_acc(List, Sub, N, 0).

list_contains_count_acc(List, Sub, N, N0) :-
   prefix_of_t(Sub, List, T),
   t_01(T, T01),
   N1 #=  N0 + T01,
   N1 #=< N,
   aux_list_contains_(List, Sub, N, N1).

aux_list_contains_([], _, N, N).
aux_list_contains_([_|Es], Sub, N, N0) :-
   list_contains_count_acc(Es, Sub, N, N0).

t_01( true, 1).
t_01(false, 0).

Some sample queries:

?- list_contains_count([1,2,3,4,5,2,3,2,3,4], [2,3,4], N).
N = 2.

?- list_contains_count([2,2,2,2,2,2], [2,2,2], N).
N = 4.

?- list_contains_count([1,2,3,1,2,3], [1,2], N).
N = 2.

OK! Same as before... How about some corner cases?

?- length(Sub, N), list_contains_count([], Sub, C).
   N = 0, C = 1, Sub = []
;  N = 1, C = 0, Sub = [_A]
;  N = 2, C = 0, Sub = [_A,_B]
;  N = 3, C = 0, Sub = [_A,_B,_C]
;  N = 4, C = 0, Sub = [_A,_B,_C,_D]
...

?- length(List, N), list_contains_count(List, [], C).
;  N = 0, C = 1, List = []
;  N = 1, C = 2, List = [_A]
;  N = 2, C = 3, List = [_A,_B]
;  N = 3, C = 4, List = [_A,_B,_C]
;  N = 4, C = 5, List = [_A,_B,_C,_D]
...

Alright... In fact, even better than before¹!

?- (  Version = old, list_contains_countX([], [], C)
   ;  Version = new, list_contains_count( [], [], C)
   ).
   Version = old, C = 0
;  Version = new, C = 1.

Which overhead does clpfd get us? Let's measure and compare some runtimes^2,3!

?- set_prolog_flag(toplevel_print_anon, false).
true.

?- length(_Xs, 100_000),
   maplist(=(x), _Xs),
   between(0, 6, _Exp),
   L #= 2 ^ _Exp,
   length(_Sub, L),
   maplist(=(x), _Sub),
   (  Version = old, call_time(list_contains_countX(_Xs,_Sub,_N), T_ms)
   ;  Version = new, call_time(list_contains_count( _Xs,_Sub,_N), T_ms)
   ).
   L =  1, Version = old, N = 100000, T_ms =  101
;  L =  1, Version = new, N = 100000, T_ms = 1422   /*  1500% slower */
;
   L =  2, Version = old, N =  99999, T_ms =  153
;  L =  2, Version = new, N =  99999, T_ms = 1479
;
   L =  4, Version = old, N =  99997, T_ms =  240
;  L =  4, Version = new, N =  99997, T_ms = 1572   /*   650% slower */
;
   L =  8, Version = old, N =  99993, T_ms =  417
;  L =  8, Version = new, N =  99993, T_ms = 1760
;
   L = 16, Version = old, N =  99985, T_ms =  778
;  L = 16, Version = new, N =  99985, T_ms = 2122   /*   280% slower */
;
   L = 32, Version = old, N =  99969, T_ms = 1497
;  L = 32, Version = new, N =  99969, T_ms = 2859
;
   L = 64, Version = old, N =  99937, T_ms = 2948
;  L = 64, Version = new, N =  99937, T_ms = 4330.  /*   150% slower */

Above clpfd code has a worst-case slowdown of 1500% compared to its non-clpfd counterpart.

So... are there any cases in which the clpfd-variant outperforms its non-clpfd counterpart? Yes!

?- length(_Xs, 100_000),
   maplist(=(x), _Xs),
   member(Ub, [0,10,100,1000,10000,100000]),
   _N #< Ub,
   (  Version = old, call_time(ignore(list_contains_countX(_Xs,[x],_N)), T_ms)
   ;  Version = new, call_time(ignore(list_contains_count( _Xs,[x],_N)), T_ms)
   ).
   Ub =      0, Version = old, T_ms =  100  
;  Ub =      0, Version = new, T_ms =    0          /* 10000% faster */
;
   Ub =     10, Version = old, T_ms =  107
;  Ub =     10, Version = new, T_ms =    1          /*  5000% faster */
;
   Ub =    100, Version = old, T_ms =  104
;  Ub =    100, Version = new, T_ms =    3          /*  3000% faster */
;
   Ub =   1000, Version = old, T_ms =  104
;  Ub =   1000, Version = new, T_ms =   17          /*   611% faster */
;
   Ub =  10000, Version = old, T_ms =  106
;  Ub =  10000, Version = new, T_ms =  130          /*   125% slower */
;
   Ub = 100000, Version = old, T_ms =  102
;  Ub = 100000, Version = new, T_ms = 1238.         /*  1300% slower */

Bottom line? If N is constrained beforehand, the new code can be a lot faster! YMMV!

^{Footnote 1: prefix([], []) succeeds;
list_contains_count([], [], 0) would not quite fit that.

Footnote 2: Compared to
the earlier definition, referred to as list_contains_countX/3 in this answer.

Footnote 3: All runtime measurements were performed using SWI-Prolog 7.3.13 (AMD64).}

score 1 · Answer 4 · edited May 23 '17 at 12:15

Straight-forward using if_/3 and prefix_of_t/3:

list_contains_count(Ls, Es, N) :-
   list_contains_acc_count(Ls, Es, 0, N).

list_contains_acc_count([], _, N, N).
list_contains_acc_count([X|Xs], Sub, N0, N) :-
   if_(prefix_of_t(Sub, [X|Xs]),
       N1 is N0+1,
       N1 = N0),
   list_contains_acc_count(Xs, Sub, N1, N).

Sample queries:

?- list_contains_count([1,2,3,4,5,2,3,2,3,4], [2,3,4], N).
N = 2.

?- list_contains_count([2,2,2,2,2,2], [2,2,2], N).
N = 4.

?- list_contains_count([1,2,3,1,2,3], [1,2], N).
N = 2.

How about something more general?

?- dif(N, 0), Sub = [_|_], list_contains_count([1,2,3,1,2,3], Sub, N).
  N = 2, Sub = [1]
; N = 2, Sub = [1,2]
; N = 2, Sub = [1,2,3]
; N = 1, Sub = [1,2,3,1]
; N = 1, Sub = [1,2,3,1,2]
; N = 1, Sub = [1,2,3,1,2,3]
; N = 2, Sub = [2]
; N = 2, Sub = [2,3]
; N = 1, Sub = [2,3,1]
; N = 1, Sub = [2,3,1,2]
; N = 1, Sub = [2,3,1,2,3]
; N = 2, Sub = [3]
; N = 1, Sub = [3,1]
; N = 1, Sub = [3,1,2]
; N = 1, Sub = [3,1,2,3]
; false.

@lurker. This is the `if_/3`-version of your predicate `subcount/3`. — repeat, Dec 03 '15 at 21:33

score 0 · Answer 5 · answered Jan 28 '14 at 12:14

0

To simplify the solution, you don't need to count the size of any lists, or check if one is shorter than another.. one sublist is a member of the main list or it isn't. So the only counter needed is the count of sublist matches.

% H1 is the head of both lists. When done recursively, checks whole list.
isl(_, []).
isl([H1|T1], [H1|T2]) :-
    isl(T1, T2).

% Base of recursion, finished first parameter
subby([], _, 0).

% ToFind is a sublist of first parameter
subby([H|T], ToFind, N) :-
    isl([H|T], ToFind),

    % recurse around, increment count
    subby(T, ToFind, Inc),
    N is Inc+1.

% ToFind is not sublist, decapitate and recurse
subby([_|T], ToFind, N) :-
    subby(T, ToFind, N).

This solution doesn't use tail recursion, which is inefficient for longer lists, but for small lists it's fine. ie. in the 2nd subby predicate, the recursion to the tail is done first, then the counts are added as the recursion bubbles up on the way back.

answered Jan 28 '14 at 12:14

magus

1,347
7
13

While I like your solution, because it's quite simpler than mine, it's important to understand *why* my code is not working, to avoid doing mistakes during the final test. – StepTNT Jan 28 '14 at 12:33
1

Suggest using the 'trace.' predicate (type that before running the main predicate), which will step prolog through every single line it is running, then you can see where it is failing. This is more useful (in learning for yourself) than someone here telling you, for each problem you have, where it is failing. – magus Jan 28 '14 at 12:39
I'm already using it, if I'm posting is becauseeven the trace doesn't help :\ – StepTNT Jan 28 '14 at 13:11
1

@StepTNT in this case, just using `trace` by itself has a lot of detail due to the complexity of the implementation of the solution. However, you could do `trace(countSublist).` to only look at what happens with that predicate. – lurker Jan 28 '14 at 14:16

score 0 · Answer 6 · answered Jan 29 '14 at 04:50

Doesn't seem like it should be any more complicated than

sublist_count(L,S,N) :-
  sublist_count(L,S,0,N)
  .

sublist_count([],_,N,N) :-
  !.
sublist_count([X|Xs],S,T,N) :-
  prefix_of([X|Xs],S) ,
  T1 is T+1 ,
  sublist_count(Xs,S,T1,N)
  !.

prefix_of(_,[]) :-
  !.
prefix_of([X|Xs],[X|Ys]) :-
  prefix_of(Xs,Ys)
  .

Ankur · Answer 7 · 2014-01-29T07:12:28.763

0

Below is my attempt. This is more of a predicate/relation approach based solution then procedural approach.

pe([],[]).
pe(X,X).
pe([_|T],X) :- pe(T,X).

sbls(_,[]).
sbls([H|T],[H|Y]) :- sbls(T,Y).

sblist(L,S) :- pe(L,X), sbls(X,S).


| ?- findall(_,sblist([1,2,3,1,2,3],[1,2]),L), length(L,Count).

Count = 2

| ?- findall(_,sblist([2,2,2,2,2,2],[2,2,2]),L), length(L,Count).

Count = 4

edited Jan 29 '14 at 07:12

answered Jan 29 '14 at 07:05

Ankur

33,367
2
46
72

Counting how many time a list is sublist of another one

7 Answers7