0

How can we print a substring that occurs between two patterns? We are also provided an offset of a character in the string as an input to choose which substring needs to be printed.

E.g.

string = '<p class="one">A quick brown fox</p><p class="two">Jumps over</p>'

The substring that needs to be printed is between the patterns "> and </p>.

If the offset provided is 20 then A quick brown fox should be printed. If the offset provided is 55 then Jumps over should be printed, this offset should be received as an input.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Lalas M
  • 1,116
  • 1
  • 12
  • 29
  • https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – matszwecja Nov 09 '22 at 09:56
  • [https://stackoverflow.com/help/how-to-ask](https://stackoverflow.com/help/how-to-ask) : "Explain how you encountered the problem you're trying to solve, and any difficulties that have prevented you from solving it yourself." – 0x0fba Nov 09 '22 at 10:14

1 Answers1

1

The offset seems to be taken as a starting point for searching the nearest "> to the left and </p> to the right.

<p class="one">A quick brown fox</p><p class="two">Jumps over</p>
                    ^                                  ^
                 offset 20                          offset 55

In other words, find the rightmost "> in the substring from the beginning to the offset, and the leftmost </p> in the substring from the offset to the end.

For this, there are the find and rfind string methods.

>>> string.rfind('">', 0, 20)
13
>>> string.rfind('">', 0, 55)
49
>>> string.find('</p>', 20)
32
>>> string.find('</p>', 55)
61

Now you can use these indices to slice the string. You have to account for the fact that rfind returns the index before ">, but you need to slice after it, so you have to add its length (2) to the start index of the slice.

>>> start = string.rfind('">', 0, 20) + 2
>>> end = string.find('</p>', 0, 20)
>>> string[start:end]
'A quick brown fox'
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
  • Thanks .. The offset can be any number between the matching patterns. I'm trying the solution as you mentioned in the comment of the original question – Lalas M Nov 09 '22 at 10:15
  • While i was testing i encountered n error in the `end` value, so changed to `end = string.find('', start)` . Could you please verify that – Lalas M Nov 09 '22 at 10:38
  • That should work as well. – mkrieger1 Nov 09 '22 at 11:03