Calculate the index of the nth word in a string

Question

Given the index of a word in a string starting at zero ("index" is position two in this sentence), and a word being defined as that which is separated by whitespace, I need to find the index of the first char of that word.

My whitespace regex pattern is "( +|\t+)+", just to cover all my bases (except new line chars, which are excluded). I used split() to separate the string into words, and then summed the lengths of each of those words. However, I need to account for the possibility that more than once whitespace character is used between words, so I can't simply add the number of words minus one to that figure and still be accurate every time.

Example:

>>> example = "This is an example sentence"
>>> get_word_index(example, 2)
8

Could you please add a simple test and example to show the problem you described? — vbarboza, Mar 05 '19 at 01:14
The whitespace regex pattern to "cover all your bases" would be `'\s+'`. — TigerhawkT3, Mar 05 '19 at 01:18

score 2 · Accepted Answer · answered Mar 05 '19 at 02:07

Change your regular expression to include the whitespace around each word to prevent it from being lost. The expression \s*\S+\s* will first consume leading whitespace, then the actual word, then trailing spaces, so only the first word in the resulting list might have leading spaces (if the string itself started with whitespace). The rest consist of the word itself potentially followed by whitespace. After you have that list, simply find the total length of all the words before the one you want, and account for any leading spaces the string may have.

def get_word_index(s, idx):
    words = re.findall(r'\s*\S+\s*', s)
    return sum(map(len, words[:idx])) + len(words[idx]) - len(words[idx].lstrip())

Testing:

>>> example = "This is an example sentence"
>>> get_word_index(example, 2)
8
>>> example2 = ' ' + example
>>> get_word_index(example2, 2)
9

While I awaited your response, I came up with my own solution that didn't pass my unit tests. However, your solution failed my unit tests the exact same way. I'm therefore going to assume that you're right and that my unit tests are not. Thank you! — Steele Farnsworth, Mar 05 '19 at 02:28

score 0 · Answer 2 · answered Mar 05 '19 at 01:24

0

Maybe you could try with:

your_string.index(your_word)

documentation

answered Mar 05 '19 at 01:24

bojan

56
1
3

Thank you for your answer. I need to be able to account for the same word appearing more than once. – Steele Farnsworth Mar 05 '19 at 01:26
maybe this is what you wanted: https://stackoverflow.com/questions/4664850/find-all-occurrences-of-a-substring-in-python – bojan Mar 05 '19 at 01:36

Calculate the index of the nth word in a string

2 Answers2