I have the following code:
public static string StripHtml(string htmlString)
{
string cleansedString = htmlString;
if (!string.IsNullOrEmpty(htmlString))
{
//<<TestString>script> will result in <script> with this regex a lone. So we also
string regex = @"(?></?\w+)(?>(?:[^>'""]+|'[^']*'|""[^""]*"")*)>";
cleansedString = Regex.Replace(htmlString, regex, string.Empty, RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
}
return cleansedString;
}
This method should strip HTML out in order to prevent users from doing HTML Injection on an ASP.NET web page (and also excel upload process on the same fields).
It works perfect except in this user case:
"<<TestString>script>" will result in "<script>"
How can I stop this from happening? I was thinking of running it in a loop to continue to StripHTML WHILE there was any brackets. But this seems like a hack. Is there a better way to write this regex to account for this use case?