Given the index of a word in a string starting at zero ("index" is position two in this sentence), and a word being defined as that which is separated by whitespace, I need to find the index of the first char of that word.
My whitespace regex pattern is "( +|\t+)+"
, just to cover all my bases (except new line chars, which are excluded). I used split()
to separate the string into words, and then summed the lengths of each of those words. However, I need to account for the possibility that more than once whitespace character is used between words, so I can't simply add the number of words minus one to that figure and still be accurate every time.
Example:
>>> example = "This is an example sentence"
>>> get_word_index(example, 2)
8