I recently started using the following RegEx in a ReReplace() function to strip HTML tags from a string using ColdFusion. Please note: I am not using this as protection from XSS or SQL injection; this is only to remove existing and safe HTML from a string before it's displayed in an HTML title attribute.
REReplaceNoCase(str,"<[^>]*>","","ALL")
In a semi-related question I asked how to modify my RegEx to include spaces and line breaks. I was told that using RegEx for this purpose is not appropriate and this post was referenced as an explanation.
I strongly suspect though that the regular expressions you have posted don't in fact work correctly. I'd advise you not to use regular expressions to parse HTML as HTML is not a regular language. Use an HTML parser instead. (Mark Byers)
If this is true, what is the appropriate tool for removing HTML from a string before it's displayed? (Baring in mind the HTML is already safe; it's sanitized before entry to the DB).
I am aware of HTMLEditFormat() and HTMLCodeFormat(), but those two functions do not provide what I need; the earlier replaces special characters with their HTML-escaped equivalents, while the latter does exactly the same but also wraps the string a <pre>
tag.
What I would like to do is clean a string from HTML and line breaks before I display in an HTML title attribute <a title="My string without HTML goes here">...</a>
There are times when the HTML is not necessary. Say you wanted to display an excerpt from a post without the HTML stored along with it, for instance.