I am working on a project that requires the parsing of "formatting tags." By using a tag like this: <b>text</b>
, it modifies the way the text will look (that tag makes the text bold). You can have up to 4 identifiers in one tag (b
for bold, i
for italics, u
for underline, and s
for strikeout).
For example:
<bi>some</b> text</i> here
would produce some text here.
To parse these tags, I'm attempting to use a RegEx to capture any text before the first opening tag, and then capture any tags and their enclosed text after that. Right now, I have this:
<(?<open>[bius]{1,4})>(?<text>.+?)</(?<close>[bius]{1,4})>
That matches a single tag, its enclosed text, and a single corresponding closing tag.
Right now, I iterate through every single character and attempt to match the position in the string I'm at to the end of the string, e.g. I attempt to match the whole string at i = 0
, a substring from position 1 to the end at i = 1
, etc.
However, this approach is incredibly inefficient. It seems like it would be better to match the entire string in one RegEx instead of manually iterating through the string.
My actual question is is it possible to match a string that does not match a group, such as a tag? I've Googled this without success, but perhaps I've not been using the right words.