-1

I have tried to frame a program in Python which would print out the number of times the provided substring appears as a pattern in a provided string.

Unfortunately, my code isn't the best solution as it might not handle some corner cases. I need a solution two-liner or three-liner code maybe) or a modified version to my code.

I also need some suggestions on code optimization if the provided input string is of length 1000 or more.

Input 1:

"XYZDFXYZXY"

"XYZ"

Output 1:

2

Input 2:

"ABCDCDC"

"CDC"

Output 2:

2

My code:

def pattern(string, sub_str ):
    counter = 0
    len_sub = len(sub_str )
    string, sub_str = string.upper(),sub_str.upper()
    for i in range(len(string)):
        sub_loop = ""
        for j in range(i,i+len_sub):
            try:
                sub_loop+=string[j] 
                if sub_loop == sub_str:
                    counter+=1
            except:
                pass
    return counter

if __name__ == '__main__':
    string = input().strip()
    sub_str = input().strip()
    count= pattern(string, sub_str )
    print(count)
Ricky Rick
  • 61
  • 1
  • 6
  • `print("XYZDFXYZXY".count("XYZ"))` ? – Patrick Artner Aug 13 '20 at 11:53
  • Hi @PatrickArtner. I have mentioned "pattern" in my question. Let's say in a Test case, given input String is "ABCDCDC" and the given input Substring is "CDC". According to your solution which is basically searching for a String not pattern, the output will be 1 but actually the ground value count of "CDC" is 2 – Ricky Rick Aug 13 '20 at 12:03
  • Does this answer your question? [How to find all occurrences of a substring?](https://stackoverflow.com/questions/4664850/how-to-find-all-occurrences-of-a-substring) – shananton Aug 13 '20 at 12:31
  • 1
    @Ricky_Rick if you want to do this with somewhat short code, check out [this answer](https://stackoverflow.com/a/34445090/6508598). If you want it to be fast (O(n) instead of O(n^2)), check out [KMP-algorithm](https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm). It's relatively easy to understand and there are probably implementations already written in python. – shananton Aug 13 '20 at 12:34
  • @shananton Thank you. This satisfies my demand more than enough. I am also able to find the occurrence of a sub-string. – Ricky Rick Aug 13 '20 at 12:52

3 Answers3

3

For overlapping patterns:

For overlapping cases we will use regex's Positive Lookahead (?=):

sub_str = "CDC"
string = "ABCDCDC"
len(re.findall(f"(?=({sub_str}))", string))

more info can be found here: https://www.regular-expressions.info/lookaround.html

Without overlapping:

Python's str has a builtin function count that let you count the number of substring occurrences in the original string.

From the function documentation:

S.count(sub[, start[, end]]) -> int

Return the number of non-overlapping occurrences of substring sub in
string S[start:end].  Optional arguments start and end are
interpreted as in slice notation.

So eventually, all you have to do is:

"XYZDFXYZXY".count("XYZ")

so in total:

if __name__ == '__main__':
    string = input().strip()
    sub_str = input().strip()
    count = string.count(sub_str)
    print(count)
Or Y
  • 2,088
  • 3
  • 16
  • 1
    While this code may solve the question, [including an explanation](//meta.stackexchange.com/q/114762) of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. –  Aug 13 '20 at 11:54
  • 1
    @OrY `"ABABA".count("ABA")` gives 1, while the OP's desired output seems to be 2 – shananton Aug 13 '20 at 12:20
  • 2
    @shananton I've edited my answer to support both overlapping and non overlapping cases. – Or Y Aug 13 '20 at 12:31
  • @OrY Thank you for the update on solving the problem with RegExp. I didn't really want to touch RegExp at all while solving this problem. – Ricky Rick Aug 13 '20 at 12:55
0

An easy solution is to use the split() and then get the length of the resulting array minus 1. In your code above:


def pattern(string, sub_str ):

    string, sub_str = string.upper(),sub_str.upper()
    Outarray= string.split(sub_str)
    counter = len(outarry) - 1
    Return counter
Jeff Sinason
  • 83
  • 1
  • 8
0

Using List Comprehension reduced the whole code to one line.

def pattern(string, sub_str):
    counter = sum([1 for i in range(len(string)) if string[i:].startswith(sub_str)])
    return counter
Ricky Rick
  • 61
  • 1
  • 6