I need to parse a string and escape all html tags except <a>
links.
For example:
"Hello, this is <b>A BOLD</b> bit and this is <a href="www.google.com">a google</a> link"
When printed out in my jsp, I want to see the tags printed out as is (i.e. escaped so "A BOLD" is not actually in bold on the page) but the <a>
link to be an actual link to google on the page.
I have got a little method that splits the incoming string based on a regex to match <a>
links in various formats (with whites spaces, single or double quotes, etc). The regex is as follows:
myString.split("<a\\s[^>]*href\\s*=\\s*[\\\"\\|\\\'][^>]*[\\\"\\|\\\']\\s*>[^<\\/a>]*<\\/a>");
Yes it's horrid and probably hopelessly inefficient so open to alternative suggestions, but it does work up to a point. Where it falls down is parsing the link text bit. I want it to accept zero or more occurrences of any characters other than the </a>
closing tag but it is parsing it as zero or more occurrences of any characters other than a "<" or "/" or "a" or ">", i.e. as individual characters rather than the complete </a>
word. So it matches with any text that has an "e" in it for example.
The bit in question is: [^<\\/a>]*
How do I change this to match on the entire word not it's constituent characters? I've tried parentheses etc but nothing works.