I have text like this -
This is a test text. <span> with bold </span> and with <span> italic </span> and so on and so forth.
Now, I am using this regex to identify all html <[^>]*>
I am then replacing all of the html with empty strings, so the result would be like this
This is a test text. with bold and with italic and so and so forth.
In the above text, I want to identify text, say, "italic" and insert special tags around it and then reconstruct the original text. So, the result would be
This is a test text. <span> with bold </span> and with <span> <span class='special'>italic</span> </span> and so on and so forth.
I am creating code that gets the matcher.start() and matcher.end() to make a list of all the html tags, then I am thinking about reconstrucing based on this list. Is there a better way to doing it? How would you solve it?
EDIT
The reason for searching for text after replacing html is because, the html interfers with the text I am looking for. So for instance, it could be like this
This is a test text. <span> with bold </span> and with <span> it</span>al<span>ic </span> and so on and so forth.
EDIT2
This is not a duplicate question like it is being suggested. Imagine a scenario, where you need to highlight the html that you see on screen, by doing nothing but adding a simple span with background color of yellow to the text of your choice. Now, imagine that this text is the word italic, but it appears as <span>ita</span>l<span>ic</span>
. My question is how would you find that word and then add span around it?
EDIT3 Final edit to simplify the problem statement. I hope this makes it clear. This is the input -
This is a test text with <span>it<span>al<span>ic</span> and etc.
This is the expected output -
This is a test text with <span class='highlight'><span>it<span>al<span>ic</span></span> and etc.