1

I am writing an application in which we are consuming a service which returns the results. The result is having some HTML elements which are to be printed as it is.

Sample Result :
{values ="Lorem impsum loren ipsum <span class=\"boldc\">bold value </span> lorenipsum .....}

Now this has to be displayed in an ASP.NET page. If I HTML encode, the span gets encoded I cannot make the items bold as desired.

@Html.Raw(Message) - this works but opens all the vulnerabilities and is dangerous.

What is the best way to handle this scenario ? Is there any way in which I can print these HTMl characters ; yet have the safety ?

  • This sounds like some of the problems that [arose in the original writing of SO](http://blog.stackoverflow.com/2008/06/podcast-11/)... – Matthew Haugen Jul 25 '14 at 01:46

2 Answers2

1

Unfortunately, no. There's no way you can achieve displaying decoded-HTML without being exposed to any other vulnerabilities with the default behaviour. You either decode or encode this html string. You could however do your own parsing after the HTML has been decoded in order to disable or more likely remove any dangerous mark-up such as script, iframe, form tags, etc

Leo
  • 14,625
  • 2
  • 37
  • 55
0

I know this question is 7 years old, but I just found a way to do it and wanted to share.

The solution to this problem can be found in this answer and it uses regex. There's a more detailed version here

So basically you have to use a Regex that captures what you want to discard (in this case the HTML tags) and what you want to encode.

I came up with this regex and it worked great:

var regex = new Regex(@"(?:<(?:\/?)(?:strong|b|em|i|u|font|sub|sup|a href|ol|ul|li|div|span|p|br|center)(?:\s(?:\w|\s)*)?>)|(.)");
return regex.Replace(value, delegate (Match m) {
                if (m.Groups[1].Value == "") return m.Value;
                else return HttpUtility.HtmlEncode(m.Value);
            });

It has the downside that you have to keep a list of all supported tags, but it's a regex that is easy to understand and maintain.

It can be simpler, if you don't check the html tags.

var regex = new Regex(@"<\/?[A-Za-z]+[^<>]*>|(.)");