I have these two HTML strings:
a="<div> foo: <span>bar</span> </div>"
b="<div> foo: bar <br> </div>"
I want to find foo: bar
from each string.
The way I want to do it is to find from the word 'foo' until I come across a '<' character.
I can do this with the regular expression:
foo([^(<)]+)
This only finds "foo: bar" from string b
but not from string a
because the <span>
tag is in the way. So I want to write the regex to look from foo
until it finds a <
character ignoring the <span>
tag.
These are just some of the strings that this has to work on therefore it has to work like states i.e. I can not start removing tags before or after etc.
Basically all I need to know is how to find all characters in a string until I come across a certain character, unless that character is is followed by a set of specified characters, i.e. find until <
but if <
is followed by span>
then look for the next <
.
Does anyone know how to do this?