I am trying to find all occurrences of sub-strings in a main string (of all lengths). My function takes one string and then returns a dictionary of every sub-string (which occurs more than once, of course) and how many times it occurs (format of the dictionary: {substring: # of occurrences, ...}
). I am using collections.Counter(s)
to help me with it.
Here is my function:
from collections import Counter
def patternFind(s):
patterns = {}
for index in range(1, len(s)+1)[::-1]:
d = nChunks(s, step=index)
parts = dict(Counter(d))
patterns.update({elem: parts[elem] for elem in parts.keys() if parts[elem] > 1})
return patterns
def nChunks(iterable, start=0, step=1):
return [iterable[i:i+step] for i in range(start, len(iterable), step)]
I have a string, data
with about 2500 random letters (in a random order). However, there are 2 strings inserted into it (random points). Say this string is 'TEST'. data.count('TEST')
returns 2. However, patternFind(data)['TEST']
gives me a KeyError
. Therefore, my program does not detect the two strings in it.
What have I done wrong? Thanks!
Edit: My method of creating testing-instances:
def createNewTest():
n = randint(500, 2500)
x, y = randint(500, n), randint(500, n)
s = ''
for i in range(n):
s += choice(uppercase)
if i == x or i == y: s += "TEST"
return s