-1

I am dealing with some string search tasks just to improve an efficient way of searching. I am trying to implement a way of counting how many substrings there are in a given set of strings by using backward search. For example given the following strings:

original = 'panamabananas$' 
s = smnpbnnaaaaa$a 
s1 = $aaaaaabmnnnps #sorted version of s

I am trying to find how many times the substring 'ban' it occurs. For doing so I was thinking in iterate through both strings with zip function. In the backward search, I should first look for the last character of ban (n) in s1 and see where it matches with the next character a in s. It matches in indexes 9,10 and 11, which actually are the third, fourth and fifth a in s. The next character to look for is b but only for the matches that occurred before (This means, where n in s1 matched with a in s). So we took those a (third, fourth and fifth) from s and see if any of those third, fourth or fifth a in s1 match with any b in s. This way we would have found an occurrence of 'ban'.

It seems complex to me to iterate and save cuasi-occurences so what I was trying is something like this:

n = 0 #counter of occurences
for i, j in zip(s1, s):
    if i == 'n' and j == 'a': # this should save the match 
        if i[3:6] == 'a' and any(j[3:6] == 'b'): 
            n += 1

I think nested if statements may be needed but I am still a beginner. Because I am getting 0 occurrences when there are one ban occurrences in the original.

Olivier Melançon
  • 21,584
  • 4
  • 41
  • 73
Joe Smith
  • 83
  • 1
  • 6
  • 1
    Possible duplicate of [Count number of occurrences of a given substring in a string](https://stackoverflow.com/questions/8899905/count-number-of-occurrences-of-a-given-substring-in-a-string) – Olivier Melançon Mar 26 '18 at 13:27
  • Ok, I'm sure I'm missing something here, but why don't you just reverse the string before searching? – ChatterOne Mar 26 '18 at 14:06
  • Because s is the Burrows Wheeler transformed of the original while s1 is the sorted s just to make things easier (at least in theory). The point is to use get occurrences of the original string by usisng the the Burrows Wheeler transformed. – Joe Smith Mar 26 '18 at 14:21
  • Sorry, I still don't understand. You have a starting string, which is `panamabananas$` and you want to know how many times `ban` appears in it, that's it? How is the Burrows Wheeler transform supposed to help? – ChatterOne Mar 26 '18 at 14:29
  • The original string is only for check if the count is correct. In practice you wont need it since Burrows transformed is strong enough to count the occurrences of a given query. Dealing with the original string would be a time-consuming task. – Joe Smith Mar 26 '18 at 14:32
  • I think that dealing with a rotation algorithm combined with search of the substring will give you enough overhead to make it faster if you deal with the original string. This is especially true if the substring that you're looking for is long enough (because you could go with Knuth-Morris-Pratt). I think a simple Python `in` is implemented with Boyer-Moore. – ChatterOne Mar 26 '18 at 15:01

1 Answers1

0

You can run a loop with find to count the number of occurence of substring.

s = 'panamabananasbananasba'
ss = 'ban'
count = 0
idx = s.find(ss, 0)
while (idx != -1):
    count += 1
    idx += len(ss)
    idx = s.find(ss, idx)
print count

If you really want backward search, then reverse the string and substring and do the same mechanism.

s = 'panamabananasbananasban'
s = s[::-1]
ss = 'ban'
ss = ss[::-1]
rashok
  • 12,790
  • 16
  • 88
  • 100