0

Suppose I have a pretty long string longString and a much shorter substring substring. I want to find the index of the first character for the nth occurrence of substring in longString. In other words, suppose substring = "stackoverflow", and I want to find the nth occurrence of "stackoverflow" in longString, and find the index of the first character of substring (which is the letter s).

Example:

longString = "stackoverflow_is_stackoverflow_not_stackoverflow_even_though_stackoverflow"
substring = "stackoverflow"
n = 2

Thus, in the above example, the index of the s in the 2nd occurrence of "stackoverflow" is 17.

I would like to find an efficient and fast way of doing so.

  • Note that the search string could overlap with itself, e.g. `abcxyzabc`, so you need to decide how you want to count a search in `abcxyzabcxyzabcxyzabc`, i.e. do you ignore the overlapping portion, or do you count it? – Tom Karzes May 07 '21 at 13:51
  • @Tom Karzes In my current situation, I would like to ignore. – SomeRandomCoder May 07 '21 at 13:52

1 Answers1

1

Here's a pretty short way:

def index_of_nth_occurrence(longstring, substring, n):
    return len(substring.join(longstring.split(substring)[:n]))
    

longstring = "stackoverflow_is_stackoverflow_not_stackoverflow_even_though_stackoverflow"
substring = "stackoverflow"
n = 2

print(index_of_nth_occurrence(longstring, substring, n)
# 17

The trick here is using str.split() to find non-overlapping occurrences of the substring, then join back the first n of them, and check how many characters that totals up to. The very next character after would be the first character of the nth occurrence of the substring.


This may be less efficient than an iterative/manual approach, and will ignore overlapping matches, but it's quick and easy.

Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53