0

I've searched far and wide and can't seem to find any prebuilt library that can do this:

Given 2 strings in python where one is the original string and the other has certain words replaced with place holders, I would like to identify the indices of the words in the original string that have been replaced with placeholders.

Example:

original = "This is the original string"

processed = "This is [placeholder] string"

indices = [8, 20]

The first index is the start of the substring that has been replaced, and the second index is the end of this substring.

Any help would be much appreciated.

Boris
  • 716
  • 1
  • 4
  • 25

1 Answers1

0

That might not be the prettiest Python code I've written, but this would work:

first = [index for index, 
        (a, b) in enumerate(zip(list(original), 
        list(processed))) if a != b ][0]

second = [len(original) - index for index, 
         (a, b) in enumerate(zip(list(original[::-1]), 
         list(processed[::-1]))) if a != b ][0]

difference = [first, second]

It returns the first index where there is a discordance. And also the same backwards.

Out[76]: [8, 20]

In more detail:

  1. Comparing every letter, it returns the first index where first != second
  2. Comparing every letter starting from the end, it returns len(first) - first difference
  3. It creates a list of these two indices
Nicolas Gervais
  • 33,817
  • 13
  • 115
  • 143
  • I forgot to mention there could be more than one placeholder in the text, in this case your solution won't do. – Boris May 29 '20 at 13:52