-1

I'm trying to find the number of times a certain string pattern is found inside a sequence of characters; for the bioinformatics lovers, it's actually finding the number of times a certain motif occurs along the genome. For this purpose I've found the following Python based function:

def search(pat, txt):
    M = len(pat)
    N = len(txt)
    
    for i in range(N - M + 1):
        j = 0 

        while(j < M):
            if (txt[i + j] != pat[j]):
                break
            j += 1
 
        if (j == M):
            print(f"{pat} found at index ", i)

Which gives me back this kind of result:

GAATC found at index  1734
GAATC found at index  2229
GAATC found at index  2363
GAATC found at index  2388
GAATC found at index  2399
GAATC found at index  2684
GAATC found at index  5634
GAATC found at index  7021
GAGTC found at index  1671
GAGTC found at index  4043

And so on for each pattern (motif). As you can see the pattern (motif) "GAATC" is present 8 times and it's repeated for each position it has been found. I'd like to have on terminal something like this:

GAATC found 8 times
GAGTC found 2 times

And so on. In the title I wrote "Creating a dictionary" supposing it's the best choice but I'm open to all the possible suggestions.

Can you help me? Thank you!

  • 1
    Do you still need to print / record the various positions tge index was found at? Or ONLY the number of times found? – MatBailie Jan 07 '23 at 00:52
  • Only the total number of times it has been found. – Iacopo Passeri Jan 07 '23 at 00:53
  • I think operating a dictionary should be well within your means. I suggest giving it an honest attempt on your own, and if you get stuck you can [edit] into the question why you had difficulties with the implementation. StackOverflow is not a code writing service. – Kraigolas Jan 07 '23 at 00:54
  • 1
    Then you don't even need a dictionary. You're checking one pattern at a time, so just put `found = 0` at the start and `found += 1` in your loop, and print the count at the end. – MatBailie Jan 07 '23 at 00:56

1 Answers1

0
def search(pat, txt):
    M = len(pat)
    N = len(txt)
    
    found = 0

    for i in range(N - M + 1):
        if (txt[i:i+M] == pat):
            found += 1

    print(f"{pat} found {found} times.")

Or use regular expressions...

import re

def search(pat, txt):
    found = len(re.findall(f'(?={pat})', text))

    print(f"{pat} found {found} times.")
MatBailie
  • 83,401
  • 18
  • 103
  • 137
  • 1
    Other faster options can be found in comments on this question; https://stackoverflow.com/questions/8899905/count-number-of-occurrences-of-a-substring-in-a-string#comment50169124_8900059 – MatBailie Jan 07 '23 at 01:10