Python: how to count overlapping occurrences of a substring

Question

I wanted to count the number of times that a string like 'aa' appears in 'aaa' (or 'aaaa').

The most obvious code gives the wrong (or at least, not the intuitive) answer:

'aaa'.count('aa')
1 # should be 2
'aaaa'.count('aa')
2 # should be 3

Does anyone have a simple way to fix this?

Or: http://stackoverflow.com/questions/2970520/string-count-with-overlapping-occurances — Nykakin, Oct 10 '13 at 17:43

score 10 · Answer 1 · answered Oct 10 '13 at 17:39

10

From str.count() documentation:

Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.

So, no. You are getting the expected result.

If you want to count number of overlapping matches, use regex:

>>> import re
>>> 
>>> len(re.findall(r'(a)(?=\1)', 'aaa'))
2

This finds all the occurrence of a, which is followed by a. The 2nd a wouldn't be captured, as we've used look-ahead, which is zero-width assertion.

answered Oct 10 '13 at 17:39

Rohit Jain

209,639
45
409
525

Well, yes. But the question is how to calculate it instead? – Simeon Visser Oct 10 '13 at 17:40
1

I mean, you're 100% correct. But that's not helpful :/ – nivk Oct 10 '13 at 17:40
1

You can instead change your OP to 'how to count overlapping occurrences of a substring', to emphasize that you want to know HOW, rather than why it is giving an answer that is not intuitive to you. – hexparrot Oct 10 '13 at 17:42

score 6 · Answer 2 · answered Oct 10 '13 at 17:42

6

haystack = "aaaa"
needle   = "aa"

matches  = sum(haystack[i:i+len(needle)] == needle 
               for i in xrange(len(haystack)-len(needle)+1))

# for Python 3 use range instead of xrange

answered Oct 10 '13 at 17:42

kindall

178,883
35
278
309

Thank you, I think I'll use this since I'm not familiar with regex – nivk Oct 10 '13 at 17:44

Lucas Ribeiro · Answer 3 · 2013-10-10T18:31:02.640

1

The solution is not taking overlap into consideration.

Try this:

big_string = "aaaa"
substring = "aaa"
count = 0 

for char in range(len(big_string)):
    count += big_string[char: char + len(subtring)] == substring

print count

edited Oct 10 '13 at 18:31

answered Oct 10 '13 at 18:25

Lucas Ribeiro

6,132
2
25
28

score 0 · Answer 4 · answered Oct 10 '13 at 17:44

You have to be careful, because you seem to looking for non-overlapping substrings. To fix this I'd do:

len([s.start() for s in re.finditer('(?=aa)', 'aaa')])

And if you don't care about the position where the substring starts you can do:

len([_ for s in re.finditer('(?=aa)', 'aaa')])

Although someone smarter than myself might be able to show that there are performances differences :)

Python: how to count overlapping occurrences of a substring

4 Answers4

Related