How to find all occurrences of a substring?

Question

Python has string.find() and string.rfind() to get the index of a substring in a string.

I'm wondering whether there is something like string.find_all() which can return all found indexes (not only the first from the beginning or the first from the end).

For example:

string = "test test test test"

print string.find('test') # 0
print string.rfind('test') # 15

#this is the goal
print string.find_all('test') # [0,5,10,15]

_{For counting the occurrences, see Count number of occurrences of a substring in a string.}

it should return '0'. Of course, in perfect world there also has to be `'ttt'.rfind_all('tt')`, which should return '1' — nukl, Jan 12 '11 at 02:47

score 768 · Accepted Answer · edited May 16 '18 at 15:30

768

There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

If you want to find overlapping matches, lookahead will do that:

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditer returns a generator, so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you're only iterating through the results once.

edited May 16 '18 at 15:30

David Leon

1,017
8
25

answered Jan 12 '11 at 02:43

moinudin

134,091
45
190
216

hi, concerning this `[m.start() for m in re.finditer('test', 'test test test test')]`, how can we look for `test` or `text`? Does it become much more complicated? – xpanta Mar 01 '13 at 10:48
8

You want to look into regular expression in general : https://docs.python.org/2/howto/regex.html. The solution to your question will be : [m.start() for m in re.finditer('te[sx]t', 'text test text test')] – Yotam Vaknin May 06 '14 at 10:21
3

What will be the time complexity of using this method ? – Pranjal Mittal Jul 13 '17 at 23:50
2

@PranjalMittal. Upper or lower bound? Best, worst or average case? – Mad Physicist Nov 06 '17 at 15:36
1

@marcog what if the substring contains parentheses or other special characters? – Bananach Nov 10 '18 at 11:18
1

This method doesn't work with overlapping strings, e.g. when searching for "ACA" in string "ACACA", it will return only index 0. In case, someone wants a solution, here it is: https://stackoverflow.com/a/3873422/6430403. Use the find method with index += 1. – Rishabh Gupta Dec 07 '19 at 17:26
7

I would recommend escaping the search strings as well, like this: `[m.start() for m in re.finditer(re.escape(search_str), input_str)]` – srctaha Jan 08 '20 at 10:18
Applied method to search for substrings in a text file, got: "error: nothing to repeat at position 0" – Aleksejs Fomins Oct 27 '20 at 13:42
I want overlapping matches. If the substring or string contain leading and lagging spaces, i got and error: expected string or bytes-like object. I need the spaces, because I don't want to match "really mean" to "really meaningful" – yangliu2 Mar 09 '21 at 23:31
This doesn't work with multi-word subwords. For example, `THIS SUB-WORD` in this sentence `Find THIS SUB-WORD in this sentence with THIS SUB-WORD` . – Abu Shoeb Jul 23 '23 at 14:21

score 171 · Answer 2 · edited Jul 23 '14 at 04:43

171

>>> help(str.find)
Help on method_descriptor:

find(...)
    S.find(sub [,start [,end]]) -> int

Thus, we can build it ourselves:

def find_all(a_str, sub):
    start = 0
    while True:
        start = a_str.find(sub, start)
        if start == -1: return
        yield start
        start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

No temporary strings or regexes required.

edited Jul 23 '14 at 04:43

Pratik Deoghare

35,497
30
100
146

answered Jan 12 '11 at 03:13

Karl Knechtel

62,466
11
102
153

27

To get overlapping matches, it should suffice to replace `start += len(sub)` with `start += 1`. – Karl Knechtel Jan 12 '11 at 03:13
5

I believe your previous comment should be a postscript in your answer. – tzot Feb 06 '11 at 19:27
1

Your code does not work for finding substr: "ATAT" in "GATATATGCATATACTT" – Ashish Negi Oct 05 '13 at 07:08
2

See the comment I made in addition. That is an example of an overlapping match. – Karl Knechtel Oct 14 '13 at 00:13
7

To match the behaviour of `re.findall`, I'd recommend adding `len(sub) or 1` instead of `len(sub)`, otherwise this generator will never terminate on empty substring. – WGH Nov 27 '15 at 00:15
Personally I think that `a_str.find` should be replaced with `a_str.index` so that `return` isn't needed. – Feb 03 '17 at 20:21

score 84 · Answer 3 · edited Jul 23 '23 at 14:26

84

Here's a (very inefficient) way to get all (i.e. even overlapping) matches:

>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]

This solution also works for multi-word subwords.

s = "Find THIS SUB-WORD in this sentence with THIS SUB-WORD"
sub = "THIS SUB-WORD"
[i for i in range(len(s)) if s.startswith(sub, I)]
# [5, 41]

edited Jul 23 '23 at 14:26

Abu Shoeb

4,747
2
40
45

answered Jan 12 '11 at 02:48

thkala

84,049
23
157
201

If we want to check many characters by using 1 for loop how can it be done? with this code, I'll have many for loop and the order of time is too high. – Prof.Plague Dec 05 '20 at 15:09
2

@thkala Very smart way of performing the operation without use of re module. Thanks for the answer! – Cute Panda Apr 18 '21 at 14:29
I think I like this answer more since it doesn't need the re module. – Shen Jul 04 '22 at 20:07
Thanks, it worked for me. The accepted answer doesn't work with multi-word subwords. For example, `THIS SUB-WORD` in this sentence `Find THIS SUB-WORD in this sentence with THIS SUB-WORD`. – Abu Shoeb Jul 23 '23 at 14:23

Idos · Answer 4 · 2016-02-03T19:23:54.480

75

Use re.finditer:

import re
sentence = input("Give me a sentence ")
word = input("What word would you like to find ")
for match in re.finditer(word, sentence):
    print (match.start(), match.end())

For word = "this" and sentence = "this is a sentence this this" this will yield the output:

(0, 4)
(19, 23)
(24, 28)

edited Feb 03 '16 at 19:23

answered Feb 03 '16 at 19:01

Idos

15,053
14
60
75

7

I think it's worth pointing out, that it works only for "non-overlapping matches", therefore won't work for: sentence="ababa" and word="aba" – AnukuL Feb 27 '18 at 02:28
This will fail if the word contains any characters that have a meaning in regex – mousetail Aug 05 '22 at 14:18

score 64 · Answer 5 · answered Dec 23 '15 at 23:09

64

Again, old thread, but here's my solution using a generator and plain str.find.

def findall(p, s):
    '''Yields all the positions of
    the pattern p in the string s.'''
    i = s.find(p)
    while i != -1:
        yield i
        i = s.find(p, i+1)

Example

x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]

returns

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]

answered Dec 23 '15 at 23:09

AkiRoss

11,745
6
59
86

4

this looks beautiful! – fabio.sang Mar 28 '19 at 20:15
4

tested and it is twice faster than the `re.finditer` solution: `310 ns ± 5.35 ns per loop` for solution with `str.find` *vs* `799 ns ± 5.72 ns per loop` for solution with `re.finditer` (on my machine). Confirms what I've noticed in the past: built-in string methods are generally faster than regex (same for nested `str.replace` vs `re.sub`) – Jean Monet Dec 22 '20 at 22:21
1

Prettiest solution. Note that one can easily generalize by introducing optional parameter `overlapping=True` and replacing `i+1` by `i + (1 if overlapping else len(p))`. – Hugues Dec 27 '20 at 02:43

score 25 · Answer 6 · edited Feb 27 '18 at 06:40

25

You can use re.finditer() for non-overlapping matches.

>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]

but won't work for:

In [1]: aString="ababa"

In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]

edited Feb 27 '18 at 06:40

AnukuL

595
1
7
21

answered Jan 12 '11 at 02:55

Chinmay Kanchi

62,729
22
87
114

13

Why make a list out of an iterator, it just slows the process. – pradyunsg May 13 '13 at 10:57
2

aString VS astring ;) – NexD. Nov 15 '16 at 14:51

score 22 · Answer 7 · answered Nov 01 '13 at 03:16

22

Come, let us recurse together.

def locations_of_substring(string, substring):
    """Return a list of locations of a substring."""

    substring_length = len(substring)    
    def recurse(locations_found, start):
        location = string.find(substring, start)
        if location != -1:
            return recurse(locations_found + [location], location+substring_length)
        else:
            return locations_found

    return recurse([], 0)

print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]

No need for regular expressions this way.

answered Nov 01 '13 at 03:16

Cody Piersall

8,312
2
43
57

4

This code has several problems. Since it's working on open-ended data sooner or later you'll bump into `RecursionError` if there are many enough occurrences. Another one are two throw-away lists it creates on each iteration just for the sake of appending one element, which is very suboptimal for a string finding function, which possibly could be called a lot of times. Although sometimes recursive functions seem elegant and clear, they should be taken with caution. – Ivan Nikolaev Nov 15 '16 at 08:54

score 13 · Answer 8 · answered Sep 24 '14 at 21:12

13

If you're just looking for a single character, this would work:

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7

Also,

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4

My hunch is that neither of these (especially #2) is terribly performant.

answered Sep 24 '14 at 21:12

jstaab

3,449
1
27
40

1

gr8 solution .. i am impressed with use of .. split() – shantanu pathak Aug 08 '19 at 08:54

score 12 · Answer 9 · answered Apr 01 '15 at 09:23

this is an old thread but i got interested and wanted to share my solution.

def find_all(a_string, sub):
    result = []
    k = 0
    while k < len(a_string):
        k = a_string.find(sub, k)
        if k == -1:
            return result
        else:
            result.append(k)
            k += 1 #change to k += len(sub) to not search overlapping results
    return result

It should return a list of positions where the substring was found. Please comment if you see an error or room for improvment.

score 9 · Answer 10 · answered Jul 06 '18 at 09:34

This does the trick for me using re.finditer

import re

text = 'This is sample text to test if this pythonic '\
       'program can serve as an indexing platform for '\
       'finding words in a paragraph. It can give '\
       'values as to where the word is located with the '\
       'different examples as stated'

#  find all occurances of the word 'as' in the above text

find_the_word = re.finditer('as', text)

for match in find_the_word:
    print('start {}, end {}, search string \'{}\''.
          format(match.start(), match.end(), match.group()))

score 6 · Answer 11 · edited May 17 '16 at 12:27

6

This thread is a little old but this worked for me:

numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"

marker = 0
while marker < len(numberString):
    try:
        print(numberString.index("five",marker))
        marker = numberString.index("five", marker) + 1
    except ValueError:
        print("String not found")
        marker = len(numberString)

edited May 17 '16 at 12:27

wingerse

3,670
1
29
61

answered Sep 01 '14 at 12:48

Andrew H

466
10
22

score 6 · Answer 12 · answered Feb 27 '18 at 06:44

6

You can try :

>>> string = "test test test test"
>>> for index,value in enumerate(string):
    if string[index:index+(len("test"))] == "test":
        print index

0
5
10
15

answered Feb 27 '18 at 06:44

Harsha Biyani

7,049
9
37
61

score 5 · Answer 13 · answered Oct 25 '21 at 10:13

5

You can try :

import re
str1 = "This dress looks good; you have good taste in clothes."
substr = "good"
result = [_.start() for _ in re.finditer(substr, str1)]
# result = [17, 32]

answered Oct 25 '21 at 10:13

Mohammad Amin Eskandari

324
5
7

1

This is no different from the [accepted answer](https://stackoverflow.com/a/4664889/14363557). – Nat Riddle Oct 25 '21 at 16:05
1

@NatRiddle The presentation Mohammad wrote the answer in is a lot cleaner. This should be the accepted answer. – lolololol ol Dec 08 '22 at 03:32
Regex is much heavier on CPU than the accepted answer – mickzer Jul 21 '23 at 11:18

score 3 · Answer 14 · answered Sep 28 '18 at 17:29

When looking for a large amount of key words in a document, use flashtext

from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)

Flashtext runs faster than regex on large list of search words.

score 3 · Answer 15 · answered Jan 13 '20 at 12:39

This function does not look at all positions inside the string, it does not waste compute resources. My try:

def findAll(string,word):
    all_positions=[]
    next_pos=-1
    while True:
        next_pos=string.find(word,next_pos+1)
        if(next_pos<0):
            break
        all_positions.append(next_pos)
    return all_positions

to use it call it like this:

result=findAll('this word is a big word man how many words are there?','word')

score 3 · Answer 16 · answered May 16 '20 at 17:05

3

src = input() # we will find substring in this string
sub = input() # substring

res = []
pos = src.find(sub)
while pos != -1:
    res.append(pos)
    pos = src.find(sub, pos + 1)

answered May 16 '20 at 17:05

mascai

1,373
1
9
30

2

While this code may resolve the OP's issue, it is best to include an explanation as to how your code addresses the OP's issue. In this way, future visitors can learn from your post, and apply it to their own code. SO is not a coding service, but a resource for knowledge. Also, high quality, complete answers are more likely to be upvoted. These features, along with the requirement that all posts are self-contained, are some of the strengths of SO as a platform, that differentiates it from forums. You can edit to add additional info &/or to supplement your explanations with source documentation – SherylHohman May 16 '20 at 18:45

score 2 · Answer 17 · answered Feb 15 '18 at 20:02

Whatever the solutions provided by others are completely based on the available method find() or any available methods.

What is the core basic algorithm to find all the occurrences of a substring in a string?

def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

You can also inherit str class to new class and can use this function below.

class newstr(str):
def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

Calling the method

newstr.find_all('Do you find this answer helpful? then upvote this!','this')

score 2 · Answer 18 · edited Jun 03 '20 at 16:12

2

This is solution of a similar question from hackerrank. I hope this could help you.

import re
a = input()
b = input()
if b not in a:
    print((-1,-1))
else:
    #create two list as
    start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
    for i in range(len(start_indc)):
        print((start_indc[i], start_indc[i]+len(b)-1))

Output:

aaadaa
aa
(0, 1)
(1, 2)
(4, 5)

edited Jun 03 '20 at 16:12

darkByt3

3
3

answered Jan 20 '20 at 22:47

Ruman Khan

61
4

WangSung · Answer 19 · 2021-11-05T08:48:14.147

2

if you want to use without re(regex) then:

find_all = lambda _str,_w : [ i for i in range(len(_str)) if _str.startswith(_w,i) ]

string = "test test test test"
print( find_all(string, 'test') ) # >>> [0, 5, 10, 15]

edited Nov 05 '21 at 08:48

answered Nov 05 '21 at 08:38

WangSung

259
2
5

score 2 · Answer 20 · answered Apr 08 '22 at 10:06

Here's a solution that I came up with, using assignment expression (new feature since Python 3.8):

string = "test test test test"
phrase = "test"
start = -1
result = [(start := string.find(phrase, start + 1)) for _ in range(string.count(phrase))]

Output:

[0, 5, 10, 15]

score 2 · Answer 21 · edited Dec 15 '22 at 19:57

I think the most clean way of solution is without libraries and yields:

def find_all_occurrences(string, sub):
    index_of_occurrences = []
    current_index = 0
    while True:
        current_index = string.find(sub, current_index)
        if current_index == -1:
            return index_of_occurrences
        else:
            index_of_occurrences.append(current_index)
            current_index += len(sub)

find_all_occurrences(string, substr)

Note: find() method returns -1 when it can't find anything

score 1 · Answer 22 · edited Apr 10 '18 at 20:11

1

The pythonic way would be:

mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]

# s represents the search string
# c represents the character string

find_all(mystring,'o')    # will return all positions of 'o'

[4, 7, 20, 26] 
>>>

edited Apr 10 '18 at 20:11

perror

7,071
16
58
85

answered Apr 10 '18 at 19:40

Harvey

39
1

4

1) How does this help a question that was answered 7 years ago? 2) [Using `lambda` this way is not Pythonic and goes against PEP8](https://stackoverflow.com/a/38381663/1040092). 3) This doesn't provide the correct output for the OPs situation – Wondercricket Apr 10 '18 at 19:47
2

Pythonic does not mean "Use as much features of python as you can think of" – klutt Jun 03 '20 at 16:18

score 1 · Answer 23 · answered Jun 10 '21 at 16:46

1

if you only want to use numpy here is a solution

import numpy as np

S= "test test test test"
S2 = 'test'
inds = np.cumsum([len(k)+len(S2) for k in S.split(S2)[:-1]])- len(S2)
print(inds)

answered Jun 10 '21 at 16:46

Phillip Maire

323
2
10

score 0 · Answer 24 · answered Mar 16 '17 at 01:14

0

please look at below code

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''


def get_substring_indices(text, s):
    result = [i for i in range(len(text)) if text.startswith(s, i)]
    return result


if __name__ == '__main__':
    text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
    s = 'wood'
    print get_substring_indices(text, s)

answered Mar 16 '17 at 01:14

黄哥Python培训

239
2
5

simple and best answer. – hamid mehmood Aug 16 '22 at 05:18

score 0 · Answer 25 · edited Nov 08 '20 at 14:35

0

def find_index(string, let):
    enumerated = [place  for place, letter in enumerate(string) if letter == let]
    return enumerated

for example :

find_index("hey doode find d", "d")

returns:

[4, 7, 13, 15]

edited Nov 08 '20 at 14:35

Sabito stands with Ukraine

4,271
8
34
56

answered Nov 08 '20 at 13:49

Elli

9
1

4

Have you actually read the question? Try `print(find_index('test test test test', 'test'))` which is the example the op gave. – Timus Nov 08 '20 at 14:04

score 0 · Answer 26 · answered May 19 '21 at 13:43

Not exactly what OP asked but you could also use the split function to get a list of where all the substrings don't occur. OP didn't specify the end goal of the code but if your goal is to remove the substrings anyways then this could be a simple one-liner. There are probably more efficient ways to do this with larger strings; regular expressions would be preferable in that case

# Extract all non-substrings
s = "an-example-string"
s_no_dash = s.split('-')
# >>> s_no_dash
# ['an', 'example', 'string']

# Or extract and join them into a sentence
s_no_dash2 = ' '.join(s.split('-'))
# >>> s_no_dash2
# 'an example string'

Did a brief skim of other answers so apologies if this is already up there.

score 0 · Answer 27 · answered Jun 02 '21 at 03:24

0

def count_substring(string, sub_string):
    c=0
    for i in range(0,len(string)-2):
        if string[i:i+len(sub_string)] == sub_string:
            c+=1
    return c

if __name__ == '__main__':
    string = input().strip()
    sub_string = input().strip()
    
    count = count_substring(string, sub_string)
    print(count)

answered Jun 02 '21 at 03:24

CHANDANA SAMINENI

1
2

Replace line 3, with: `for i in range(0,len(string)+1-len(sub_string)):` – Dineshkumar Jan 20 '22 at 05:22
Yeah. Thanks for the correction. – CHANDANA SAMINENI Jan 29 '22 at 06:30

Lucas LP · Answer 28 · 2021-06-26T18:58:46.607

I runned in the same problem and did this:

hw = 'Hello oh World!'
list_hw = list(hw)
o_in_hw = []

while True:
    o = hw.find('o')
    if o != -1:
        o_in_hw.append(o)
        list_hw[o] = ' '
        hw = ''.join(list_hw)
    else:
        print(o_in_hw)
        break

Im pretty new at coding so you can probably simplify it (and if planned to used continuously of course make it a function).

All and all it works as intended for what i was doing.

Edit: Please consider this is for single characters only, and it will change your variable, so you have to create a copy of the string in a new variable to save it, i didnt put it in the code cause its easy and its only to show how i made it work.

score -1 · Answer 29 · edited Jul 30 '19 at 12:04

-1

By slicing we find all the combinations possible and append them in a list and find the number of times it occurs using count function

s=input()
n=len(s)
l=[]
f=input()
print(s[0])
for i in range(0,n):
    for j in range(1,n+1):
        l.append(s[i:j])
if f in l:
    print(l.count(f))

edited Jul 30 '19 at 12:04

barbsan

3,418
11
21
28

answered Jul 30 '19 at 11:44

BONTHA SREEVIDHYA

9
1

When `s="test test test test"` and `f="test"` your code prints `4`, but OP expected `[0,5,10,15]` – barbsan Jul 30 '19 at 12:10
Have written for a single word will update the code – BONTHA SREEVIDHYA Jul 31 '19 at 13:14

score -1 · Answer 30 · answered Apr 30 '22 at 08:00

To find all the occurence of a character in a give string and return as a dictionary eg: hello result : {'h':1, 'e':1, 'l':2, 'o':1}

def count(string):
   result = {}
   if(string):
     for i in string:
       result[i] = string.count(i)
     return result
   return {}

or else you do like this

from collections import Counter

   def count(string):
      return Counter(string)

score -1 · Answer 31 · answered Jun 09 '22 at 13:17

-1

Try this it worked for me !

x=input('enter the string')
y=input('enter the substring')
z,r=x.find(y),x.rfind(y)
while z!=r:
        print(z,r,end=' ')
        z=z+len(y)
        r=r-len(y)
        z,r=x.find(y,z,r),x.rfind(y,z,r)

answered Jun 09 '22 at 13:17

Shiva Gupta

29
6

score -3 · Answer 32 · answered Dec 01 '18 at 19:09

-3

You can easily use:

string.count('test')!

https://www.programiz.com/python-programming/methods/string/count

Cheers!

answered Dec 01 '18 at 19:09

RaySaraiva

383
2
5

this should be the answer – Maxwell Chandler Mar 12 '19 at 07:18
14

The string count() method returns the number of occurrences of a substring in the given string. Not their location. – Astrid Apr 10 '19 at 12:21
7

this doesnt satisfty all cases, s = 'banana' , sub = 'ana'. Sub occurs in this situation twice but doing s.sub('ana') would return 1 – Jul 03 '19 at 16:46

How to find all occurrences of a substring?

32 Answers32

Example

Linked

Related