Repeated String Match association with Boyer Moore Algorithm

Question

Posting this question after looking around different blogs/SO questions and still not finding an answer

I'm trying to wrap my head around a solution/algorithm used from a leetcode contest.

Here is the question:

Given two strings A and B, find the minimum number of times A has to be repeated such that B is a substring of it. If no such solution, return -1.

For example, with A = "abcd" and B = "cdabcdab".

Return 3, because by repeating A three times (“abcdabcdabcd”), B is a substring of it; and B is not a substring of A repeated two times ("abcdabcd").

I know that rolling hash approach is the preferred way to go but I decided to start with the Boyer Moore approach.

After doing some research I learned that the following code used Boyer Moore Algorithm behind the scenes. Here is the code:

def repeatedStringMatch(self, A, B):
    """
    :type A: str
    :type B: str
    :rtype: int
    """
    m = math.ceil(len(B)/len(A))
    if B in A * m:
        return m
    elif B in A * (m + 1):
        return m + 1
    else:
        return -1

Based on my understanding of the algorithm I'm not convinced how the above solution might be using BM approach.

I'm specifically not clear what this line

  m = math.ceil(len(B)/len(A))

does and why we have to find m in this fashion. Could anyone help me here?

Thanks in advance

The purpose of `m` looks to be a multiplier on the string`A` so that `A` is greater than or equal to the length of `B`. I'm not so familiar with Boyer Moore, but doing `B in A` is definitely not BM since that preforms a naive search. — Miket25, Jul 09 '18 at 04:05
@Miket25, the algorithm behind `B in A` appears to be [boyer-moore in cpython](https://github.com/python/cpython/blob/master/Objects/stringlib/fastsearch.h) — Haleemur Ali, Jul 09 '18 at 04:08
@HaleemurAli Ah so it seems to be a mix of Boyer-Moore and Horspool from your link and this [question](https://stackoverflow.com/questions/18139660/python-string-in-operator-implementation-algorithm-and-time-complexity) — Miket25, Jul 09 '18 at 04:19

Haleemur Ali · Accepted Answer · 2018-07-09T04:18:21.653

The smaller string must be repeated at least m times for the larger string to be contained within.

The minimum value that m can assume is the smallest integer greater than the ratio of the lengths of two strings (larger / smaller), because, to contain a substring of length l, a string must have at least length l.

Using the example you shared.

A = "abcd"
B = "cdabcdab"
m0 = len(B) / len(A)
# m0 == 2.0
m = math.ceil(m0)
# m = 2

However, "cdabcdab" is not contained in "abcdabcd". But, if we repeat "abcd" 1 more time, we find that "cdabcdab" is then a substring. Repeating "abcd" further doesn't change whether it may be found as a substring or not. So, it is only necessary to check containment for m & (m+1) repetitions.

The python code you shared does not implement any searching algorithm, it just uses whatever searching algorithm is implemented for use with in. The specific algorithm might be implementation dependent, however there is a high likelihood for it to be boyer-moore or a variation of, as it is popular & efficient.

edit:

the algorithm behind B in A appears to be boyer-moore in cpython

ah that makes perfect sense - thank you for the explanation :) — InquisitiveGirl, Jul 09 '18 at 04:18

Repeated String Match association with Boyer Moore Algorithm

1 Answers1