0

My question is more or less similar to: Is there a way to substring a string in Python? but it's more specifically oriented. How can I get a par of a string which is located between two known words in the initial string.

Example:

mySrting = "this is the initial string"
Substring = "initial"

knowing that "the" and "string" are the two known words in the string that can be used to get the substring.

Thank you!

Anas Bouayed
  • 119
  • 1
  • 1
  • 11

3 Answers3

2

You can start with simple string manipulation here. str.index is your best friend there, as it will tell you the position of a substring within a string; and you can also start searching somewhere later in the string:

>>> myString = "this is the initial string"
>>> myString.index('the')
8
>>> myString.index('string', 8)
20

Looking at the slice [8:20], we already get close to what we want:

>>> myString[8:20]
'the initial '

Of course, since we found the beginning position of 'the', we need to account for its length. And finally, we might want to strip whitespace:

>>> myString[8 + 3:20]
' initial '
>>> myString[8 + 3:20].strip()
'initial'

Combined, you would do this:

startIndex = myString.index('the')
substring = myString[startIndex + 3 : myString.index('string', startIndex)].strip()

If you want to look for matches multiple times, then you just need to repeat doing this while looking only at the rest of the string. Since str.index will only ever find the first match, you can use this to scan the string very efficiently:

searchString = 'this is the initial string but I added the relevant string pair a few more times into the search string.'
startWord = 'the'
endWord = 'string'
results = []

index = 0
while True:
    try:
        startIndex = searchString.index(startWord, index)
        endIndex = searchString.index(endWord, startIndex)

        results.append(searchString[startIndex + len(startWord):endIndex].strip())

        # move the index to the end
        index = endIndex + len(endWord)

    except ValueError:
        # str.index raises a ValueError if there is no match; in that
        # case we know that we’re done looking at the string, so we can
        # break out of the loop
        break

print(results)
# ['initial', 'relevant', 'search']
poke
  • 369,085
  • 72
  • 557
  • 602
1

You can also try something like this:

mystring = "this is the initial string"
    mystring = mystring.strip().split(" ")
    for i in range(1,len(mystring)-1):
        if(mystring[i-1] == "the" and mystring[i+1] == "string"):
            print(mystring[i])
Palash Jain
  • 38
  • 3
  • 7
0

I suggest using a combination of list, split and join methods. This should help if you are looking for more than 1 word in the substring.

  1. Turn the string into array:

    words = list(string.split())

  2. Get the index of your opening and closing markers then return the substring:

    open = words.index('the') close = words.index('string') substring = ''.join(words[open+1:close])

You may want to improve a bit with the checking for the validity before proceeding.


If your problem gets more complex, i.e multiple occurrences of the pair values, I suggest using regular expression.

import re substring = ''.join(re.findall(r'the (.+?) string', string))

The re should store substrings separately if you view them in list.

I am using the spaces between the description to rule out the spaces between words, you can modify to your needs as well.

Khanh Nguyen
  • 101
  • 4