570

Python has string.find() and string.rfind() to get the index of a substring in a string.

I'm wondering whether there is something like string.find_all() which can return all found indexes (not only the first from the beginning or the first from the end).

For example:

string = "test test test test"

print string.find('test') # 0
print string.rfind('test') # 15

#this is the goal
print string.find_all('test') # [0,5,10,15]

For counting the occurrences, see Count number of occurrences of a substring in a string.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
nukl
  • 10,073
  • 15
  • 42
  • 58

32 Answers32

768

There is no simple built-in string function that does what you're looking for, but you could use the more powerful regular expressions:

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

If you want to find overlapping matches, lookahead will do that:

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditer returns a generator, so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you're only iterating through the results once.

David Leon
  • 1,017
  • 8
  • 25
moinudin
  • 134,091
  • 45
  • 190
  • 216
  • hi, concerning this `[m.start() for m in re.finditer('test', 'test test test test')]`, how can we look for `test` or `text`? Does it become much more complicated? – xpanta Mar 01 '13 at 10:48
  • 8
    You want to look into regular expression in general : https://docs.python.org/2/howto/regex.html. The solution to your question will be : [m.start() for m in re.finditer('te[sx]t', 'text test text test')] – Yotam Vaknin May 06 '14 at 10:21
  • 3
    What will be the time complexity of using this method ? – Pranjal Mittal Jul 13 '17 at 23:50
  • 2
    @PranjalMittal. Upper or lower bound? Best, worst or average case? – Mad Physicist Nov 06 '17 at 15:36
  • 1
    @marcog what if the substring contains parentheses or other special characters? – Bananach Nov 10 '18 at 11:18
  • 1
    This method doesn't work with overlapping strings, e.g. when searching for "ACA" in string "ACACA", it will return only index 0. In case, someone wants a solution, here it is: https://stackoverflow.com/a/3873422/6430403. Use the find method with index += 1. – Rishabh Gupta Dec 07 '19 at 17:26
  • 7
    I would recommend escaping the search strings as well, like this: `[m.start() for m in re.finditer(re.escape(search_str), input_str)]` – srctaha Jan 08 '20 at 10:18
  • Applied method to search for substrings in a text file, got: "error: nothing to repeat at position 0" – Aleksejs Fomins Oct 27 '20 at 13:42
  • I want overlapping matches. If the substring or string contain leading and lagging spaces, i got and error: expected string or bytes-like object. I need the spaces, because I don't want to match "really mean" to "really meaningful" – yangliu2 Mar 09 '21 at 23:31
  • This doesn't work with multi-word subwords. For example, `THIS SUB-WORD` in this sentence `Find THIS SUB-WORD in this sentence with THIS SUB-WORD` . – Abu Shoeb Jul 23 '23 at 14:21
171
>>> help(str.find)
Help on method_descriptor:

find(...)
    S.find(sub [,start [,end]]) -> int

Thus, we can build it ourselves:

def find_all(a_str, sub):
    start = 0
    while True:
        start = a_str.find(sub, start)
        if start == -1: return
        yield start
        start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

No temporary strings or regexes required.

Pratik Deoghare
  • 35,497
  • 30
  • 100
  • 146
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
  • 27
    To get overlapping matches, it should suffice to replace `start += len(sub)` with `start += 1`. – Karl Knechtel Jan 12 '11 at 03:13
  • 5
    I believe your previous comment should be a postscript in your answer. – tzot Feb 06 '11 at 19:27
  • 1
    Your code does not work for finding substr: "ATAT" in "GATATATGCATATACTT" – Ashish Negi Oct 05 '13 at 07:08
  • 2
    See the comment I made in addition. That is an example of an overlapping match. – Karl Knechtel Oct 14 '13 at 00:13
  • 7
    To match the behaviour of `re.findall`, I'd recommend adding `len(sub) or 1` instead of `len(sub)`, otherwise this generator will never terminate on empty substring. – WGH Nov 27 '15 at 00:15
  • Personally I think that `a_str.find` should be replaced with `a_str.index` so that `return` isn't needed. –  Feb 03 '17 at 20:21
84

Here's a (very inefficient) way to get all (i.e. even overlapping) matches:

>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]

This solution also works for multi-word subwords.

s = "Find THIS SUB-WORD in this sentence with THIS SUB-WORD"
sub = "THIS SUB-WORD"
[i for i in range(len(s)) if s.startswith(sub, I)]
# [5, 41]
Abu Shoeb
  • 4,747
  • 2
  • 40
  • 45
thkala
  • 84,049
  • 23
  • 157
  • 201
  • If we want to check many characters by using 1 for loop how can it be done? with this code, I'll have many for loop and the order of time is too high. – Prof.Plague Dec 05 '20 at 15:09
  • 2
    @thkala Very smart way of performing the operation without use of re module. Thanks for the answer! – Cute Panda Apr 18 '21 at 14:29
  • I think I like this answer more since it doesn't need the re module. – Shen Jul 04 '22 at 20:07
  • Thanks, it worked for me. The accepted answer doesn't work with multi-word subwords. For example, `THIS SUB-WORD` in this sentence `Find THIS SUB-WORD in this sentence with THIS SUB-WORD`. – Abu Shoeb Jul 23 '23 at 14:23
75

Use re.finditer:

import re
sentence = input("Give me a sentence ")
word = input("What word would you like to find ")
for match in re.finditer(word, sentence):
    print (match.start(), match.end())

For word = "this" and sentence = "this is a sentence this this" this will yield the output:

(0, 4)
(19, 23)
(24, 28)
Idos
  • 15,053
  • 14
  • 60
  • 75
  • 7
    I think it's worth pointing out, that it works only for "non-overlapping matches", therefore won't work for: sentence="ababa" and word="aba" – AnukuL Feb 27 '18 at 02:28
  • This will fail if the word contains any characters that have a meaning in regex – mousetail Aug 05 '22 at 14:18
64

Again, old thread, but here's my solution using a generator and plain str.find.

def findall(p, s):
    '''Yields all the positions of
    the pattern p in the string s.'''
    i = s.find(p)
    while i != -1:
        yield i
        i = s.find(p, i+1)

Example

x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]

returns

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]
AkiRoss
  • 11,745
  • 6
  • 59
  • 86
  • 4
    this looks beautiful! – fabio.sang Mar 28 '19 at 20:15
  • 4
    tested and it is twice faster than the `re.finditer` solution: `310 ns ± 5.35 ns per loop` for solution with `str.find` *vs* `799 ns ± 5.72 ns per loop` for solution with `re.finditer` (on my machine). Confirms what I've noticed in the past: built-in string methods are generally faster than regex (same for nested `str.replace` vs `re.sub`) – Jean Monet Dec 22 '20 at 22:21
  • 1
    Prettiest solution. Note that one can easily generalize by introducing optional parameter `overlapping=True` and replacing `i+1` by `i + (1 if overlapping else len(p))`. – Hugues Dec 27 '20 at 02:43
25

You can use re.finditer() for non-overlapping matches.

>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]

but won't work for:

In [1]: aString="ababa"

In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]
AnukuL
  • 595
  • 1
  • 7
  • 21
Chinmay Kanchi
  • 62,729
  • 22
  • 87
  • 114
22

Come, let us recurse together.

def locations_of_substring(string, substring):
    """Return a list of locations of a substring."""

    substring_length = len(substring)    
    def recurse(locations_found, start):
        location = string.find(substring, start)
        if location != -1:
            return recurse(locations_found + [location], location+substring_length)
        else:
            return locations_found

    return recurse([], 0)

print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]

No need for regular expressions this way.

Cody Piersall
  • 8,312
  • 2
  • 43
  • 57
  • 4
    This code has several problems. Since it's working on open-ended data sooner or later you'll bump into `RecursionError` if there are many enough occurrences. Another one are two throw-away lists it creates on each iteration just for the sake of appending one element, which is very suboptimal for a string finding function, which possibly could be called a lot of times. Although sometimes recursive functions seem elegant and clear, they should be taken with caution. – Ivan Nikolaev Nov 15 '16 at 08:54
13

If you're just looking for a single character, this would work:

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7

Also,

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4

My hunch is that neither of these (especially #2) is terribly performant.

jstaab
  • 3,449
  • 1
  • 27
  • 40
12

this is an old thread but i got interested and wanted to share my solution.

def find_all(a_string, sub):
    result = []
    k = 0
    while k < len(a_string):
        k = a_string.find(sub, k)
        if k == -1:
            return result
        else:
            result.append(k)
            k += 1 #change to k += len(sub) to not search overlapping results
    return result

It should return a list of positions where the substring was found. Please comment if you see an error or room for improvment.

Thurines
  • 121
  • 1
  • 3
9

This does the trick for me using re.finditer

import re

text = 'This is sample text to test if this pythonic '\
       'program can serve as an indexing platform for '\
       'finding words in a paragraph. It can give '\
       'values as to where the word is located with the '\
       'different examples as stated'

#  find all occurances of the word 'as' in the above text

find_the_word = re.finditer('as', text)

for match in find_the_word:
    print('start {}, end {}, search string \'{}\''.
          format(match.start(), match.end(), match.group()))
Bruno Vermeulen
  • 2,970
  • 2
  • 15
  • 29
6

This thread is a little old but this worked for me:

numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"

marker = 0
while marker < len(numberString):
    try:
        print(numberString.index("five",marker))
        marker = numberString.index("five", marker) + 1
    except ValueError:
        print("String not found")
        marker = len(numberString)
wingerse
  • 3,670
  • 1
  • 29
  • 61
Andrew H
  • 466
  • 10
  • 22
6

You can try :

>>> string = "test test test test"
>>> for index,value in enumerate(string):
    if string[index:index+(len("test"))] == "test":
        print index

0
5
10
15
Harsha Biyani
  • 7,049
  • 9
  • 37
  • 61
5

You can try :

import re
str1 = "This dress looks good; you have good taste in clothes."
substr = "good"
result = [_.start() for _ in re.finditer(substr, str1)]
# result = [17, 32]
3

When looking for a large amount of key words in a document, use flashtext

from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)

Flashtext runs faster than regex on large list of search words.

Uri Goren
  • 13,386
  • 6
  • 58
  • 110
3

This function does not look at all positions inside the string, it does not waste compute resources. My try:

def findAll(string,word):
    all_positions=[]
    next_pos=-1
    while True:
        next_pos=string.find(word,next_pos+1)
        if(next_pos<0):
            break
        all_positions.append(next_pos)
    return all_positions

to use it call it like this:

result=findAll('this word is a big word man how many words are there?','word')
3
src = input() # we will find substring in this string
sub = input() # substring

res = []
pos = src.find(sub)
while pos != -1:
    res.append(pos)
    pos = src.find(sub, pos + 1)
mascai
  • 1,373
  • 1
  • 9
  • 30
  • 2
    While this code may resolve the OP's issue, it is best to include an explanation as to how your code addresses the OP's issue. In this way, future visitors can learn from your post, and apply it to their own code. SO is not a coding service, but a resource for knowledge. Also, high quality, complete answers are more likely to be upvoted. These features, along with the requirement that all posts are self-contained, are some of the strengths of SO as a platform, that differentiates it from forums. You can edit to add additional info &/or to supplement your explanations with source documentation – SherylHohman May 16 '20 at 18:45
2

Whatever the solutions provided by others are completely based on the available method find() or any available methods.

What is the core basic algorithm to find all the occurrences of a substring in a string?

def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

You can also inherit str class to new class and can use this function below.

class newstr(str):
def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

Calling the method

newstr.find_all('Do you find this answer helpful? then upvote this!','this')

2

This is solution of a similar question from hackerrank. I hope this could help you.

import re
a = input()
b = input()
if b not in a:
    print((-1,-1))
else:
    #create two list as
    start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
    for i in range(len(start_indc)):
        print((start_indc[i], start_indc[i]+len(b)-1))

Output:

aaadaa
aa
(0, 1)
(1, 2)
(4, 5)
darkByt3
  • 3
  • 3
Ruman Khan
  • 61
  • 4
2

if you want to use without re(regex) then:

find_all = lambda _str,_w : [ i for i in range(len(_str)) if _str.startswith(_w,i) ]

string = "test test test test"
print( find_all(string, 'test') ) # >>> [0, 5, 10, 15]
WangSung
  • 259
  • 2
  • 5
2

Here's a solution that I came up with, using assignment expression (new feature since Python 3.8):

string = "test test test test"
phrase = "test"
start = -1
result = [(start := string.find(phrase, start + 1)) for _ in range(string.count(phrase))]

Output:

[0, 5, 10, 15]
Mike
  • 113
  • 2
  • 6
2

I think the most clean way of solution is without libraries and yields:

def find_all_occurrences(string, sub):
    index_of_occurrences = []
    current_index = 0
    while True:
        current_index = string.find(sub, current_index)
        if current_index == -1:
            return index_of_occurrences
        else:
            index_of_occurrences.append(current_index)
            current_index += len(sub)

find_all_occurrences(string, substr)

Note: find() method returns -1 when it can't find anything

STerliakov
  • 4,983
  • 3
  • 15
  • 37
ulas.kesik
  • 148
  • 1
  • 5
1

The pythonic way would be:

mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]

# s represents the search string
# c represents the character string

find_all(mystring,'o')    # will return all positions of 'o'

[4, 7, 20, 26] 
>>> 
perror
  • 7,071
  • 16
  • 58
  • 85
Harvey
  • 39
  • 1
  • 4
    1) How does this help a question that was answered 7 years ago? 2) [Using `lambda` this way is not Pythonic and goes against PEP8](https://stackoverflow.com/a/38381663/1040092). 3) This doesn't provide the correct output for the OPs situation – Wondercricket Apr 10 '18 at 19:47
  • 2
    Pythonic does not mean "Use as much features of python as you can think of" – klutt Jun 03 '20 at 16:18
1

if you only want to use numpy here is a solution

import numpy as np

S= "test test test test"
S2 = 'test'
inds = np.cumsum([len(k)+len(S2) for k in S.split(S2)[:-1]])- len(S2)
print(inds)

Phillip Maire
  • 323
  • 2
  • 10
0

please look at below code

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''


def get_substring_indices(text, s):
    result = [i for i in range(len(text)) if text.startswith(s, i)]
    return result


if __name__ == '__main__':
    text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
    s = 'wood'
    print get_substring_indices(text, s)
0
def find_index(string, let):
    enumerated = [place  for place, letter in enumerate(string) if letter == let]
    return enumerated

for example :

find_index("hey doode find d", "d") 

returns:

[4, 7, 13, 15]
Elli
  • 9
  • 1
  • 4
    Have you actually read the question? Try `print(find_index('test test test test', 'test'))` which is the example the op gave. – Timus Nov 08 '20 at 14:04
0

Not exactly what OP asked but you could also use the split function to get a list of where all the substrings don't occur. OP didn't specify the end goal of the code but if your goal is to remove the substrings anyways then this could be a simple one-liner. There are probably more efficient ways to do this with larger strings; regular expressions would be preferable in that case

# Extract all non-substrings
s = "an-example-string"
s_no_dash = s.split('-')
# >>> s_no_dash
# ['an', 'example', 'string']

# Or extract and join them into a sentence
s_no_dash2 = ' '.join(s.split('-'))
# >>> s_no_dash2
# 'an example string'

Did a brief skim of other answers so apologies if this is already up there.

als0052
  • 388
  • 3
  • 13
0
def count_substring(string, sub_string):
    c=0
    for i in range(0,len(string)-2):
        if string[i:i+len(sub_string)] == sub_string:
            c+=1
    return c

if __name__ == '__main__':
    string = input().strip()
    sub_string = input().strip()
    
    count = count_substring(string, sub_string)
    print(count)
0

I runned in the same problem and did this:

hw = 'Hello oh World!'
list_hw = list(hw)
o_in_hw = []

while True:
    o = hw.find('o')
    if o != -1:
        o_in_hw.append(o)
        list_hw[o] = ' '
        hw = ''.join(list_hw)
    else:
        print(o_in_hw)
        break

Im pretty new at coding so you can probably simplify it (and if planned to used continuously of course make it a function).

All and all it works as intended for what i was doing.

Edit: Please consider this is for single characters only, and it will change your variable, so you have to create a copy of the string in a new variable to save it, i didnt put it in the code cause its easy and its only to show how i made it work.

Lucas LP
  • 11
  • 2
-1

By slicing we find all the combinations possible and append them in a list and find the number of times it occurs using count function

s=input()
n=len(s)
l=[]
f=input()
print(s[0])
for i in range(0,n):
    for j in range(1,n+1):
        l.append(s[i:j])
if f in l:
    print(l.count(f))
barbsan
  • 3,418
  • 11
  • 21
  • 28
-1

To find all the occurence of a character in a give string and return as a dictionary eg: hello result : {'h':1, 'e':1, 'l':2, 'o':1}

def count(string):
   result = {}
   if(string):
     for i in string:
       result[i] = string.count(i)
     return result
   return {}

or else you do like this

from collections import Counter

   def count(string):
      return Counter(string)
-1

Try this it worked for me !

x=input('enter the string')
y=input('enter the substring')
z,r=x.find(y),x.rfind(y)
while z!=r:
        print(z,r,end=' ')
        z=z+len(y)
        r=r-len(y)
        z,r=x.find(y,z,r),x.rfind(y,z,r)
-3

You can easily use:

string.count('test')!

https://www.programiz.com/python-programming/methods/string/count

Cheers!

RaySaraiva
  • 383
  • 2
  • 5