Is there a way to substring, which is between two words in the string in Python?

Question

My question is more or less similar to: Is there a way to substring a string in Python? but it's more specifically oriented. How can I get a par of a string which is located between two known words in the initial string.

Example:

mySrting = "this is the initial string"
Substring = "initial"

knowing that "the" and "string" are the two known words in the string that can be used to get the substring.

Thank you!

So you want the string between two known words? Why are the spaces not part of the `Substring`? — Willem Van Onsem, Jul 14 '17 at 10:31
Furthermore what should happen if `'the'` and `'string'` occur multiple times in `mySrting`? — Willem Van Onsem, Jul 14 '17 at 10:32
@WillemVanOnsem then it should maybe show a list of strings. — Anas Bouayed, Jul 14 '17 at 10:40
@WillemVanOnsem and the spaces can be included in the two other words 'the ' and ' string' — Anas Bouayed, Jul 14 '17 at 10:41

poke · Accepted Answer · 2017-07-14T10:52:52.223

You can start with simple string manipulation here. str.index is your best friend there, as it will tell you the position of a substring within a string; and you can also start searching somewhere later in the string:

>>> myString = "this is the initial string"
>>> myString.index('the')
8
>>> myString.index('string', 8)
20

Looking at the slice [8:20], we already get close to what we want:

>>> myString[8:20]
'the initial '

Of course, since we found the beginning position of 'the', we need to account for its length. And finally, we might want to strip whitespace:

>>> myString[8 + 3:20]
' initial '
>>> myString[8 + 3:20].strip()
'initial'

Combined, you would do this:

startIndex = myString.index('the')
substring = myString[startIndex + 3 : myString.index('string', startIndex)].strip()

If you want to look for matches multiple times, then you just need to repeat doing this while looking only at the rest of the string. Since str.index will only ever find the first match, you can use this to scan the string very efficiently:

searchString = 'this is the initial string but I added the relevant string pair a few more times into the search string.'
startWord = 'the'
endWord = 'string'
results = []

index = 0
while True:
    try:
        startIndex = searchString.index(startWord, index)
        endIndex = searchString.index(endWord, startIndex)

        results.append(searchString[startIndex + len(startWord):endIndex].strip())

        # move the index to the end
        index = endIndex + len(endWord)

    except ValueError:
        # str.index raises a ValueError if there is no match; in that
        # case we know that we’re done looking at the string, so we can
        # break out of the loop
        break

print(results)
# ['initial', 'relevant', 'search']

score 1 · Answer 2 · answered Jul 14 '17 at 10:59

You can also try something like this:

mystring = "this is the initial string"
    mystring = mystring.strip().split(" ")
    for i in range(1,len(mystring)-1):
        if(mystring[i-1] == "the" and mystring[i+1] == "string"):
            print(mystring[i])

Khanh Nguyen · Answer 3 · 2017-07-14T11:50:04.893

I suggest using a combination of list, split and join methods. This should help if you are looking for more than 1 word in the substring.

Turn the string into array:

words = list(string.split())
Get the index of your opening and closing markers then return the substring:

open = words.index('the') close = words.index('string') substring = ''.join(words[open+1:close])

You may want to improve a bit with the checking for the validity before proceeding.

If your problem gets more complex, i.e multiple occurrences of the pair values, I suggest using regular expression.

import re substring = ''.join(re.findall(r'the (.+?) string', string))

The re should store substrings separately if you view them in list.

I am using the spaces between the description to rule out the spaces between words, you can modify to your needs as well.

Is there a way to substring, which is between two words in the string in Python?

3 Answers3