1

I wish to store a formatted text using a Rich-Text editor (QuillJS) and while displaying back it should be rendered as HTML. By default the views encode the HTML to avoid JS Injection, and so the data is being treated as a plain string.

How do i manage to store and display / render the data as HTML, while at the same time filtering any JS in the string ?

I tried searching for api's but couldn't find any help. Secondly, these days it's getting increasing difficult getting to the documentation with just class name, hence full class name is highly appreciated in the answers.

Praveen Rai
  • 777
  • 1
  • 7
  • 28
  • 1
    Use `@Html.Raw(...)` but you will need to parse out any ` –  May 24 '18 at 07:42
  • @StephenMuecke Wouldn't it be an over-kill to load a HTML document just to strip off any JS in it ?? Do you see any simpler solution ? – Praveen Rai May 24 '18 at 09:05
  • @StephenMuecke Also, i am kind of disappointed that this basic function is not part of official .Net Core package. – Praveen Rai May 24 '18 at 09:06
  • Why do you think its overkill? –  May 24 '18 at 09:19
  • @StephenMuecke As the library is meant to do a lot more of stuff. To just remove scripts, we'll have to load the HTML, then traverse through elements. What're your thoughts on using string or regex class to achieve the same ? – Praveen Rai May 24 '18 at 10:11
  • Its only a few lines of code. - Refer [here](http://htmlagilitypack.blogspot.com.au/2014/02/how-to-remove-script-tags-from-html.html) for an example. And it is not recommended to use regex - refer [this Q/A](https://stackoverflow.com/questions/4683046/regular-expression-for-extracting-script-tags) –  May 24 '18 at 10:16
  • Though the client side code is short and simple, but the library would be doing a lot at the back-end. Right ? Anyways, i think there's no other option. Can you please write a sample code in answer along with your suggestion ? And please do consider writing an example for removing external JS as well. Thanks. – Praveen Rai May 24 '18 at 10:27
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/171684/discussion-between-stephen-muecke-and-praveen-rai). –  May 24 '18 at 10:28

2 Answers2

1

Do not use @Html.Raw(...). Users can perform Javascript injections. There exists many libraries to prevent JS injections. I have used AntiXSS to display HTML.

AntiXSS: https://www.nuget.org/packages/AntiXSS/

Pang
  • 9,564
  • 146
  • 81
  • 122
  • i want the data to be displayed as HTML, so definitely i need raw, but after stripping any JS associated. That's the question, how do we filter that ? AntiXSS is not available for Asp.Net core – Praveen Rai May 24 '18 at 08:52
  • The AntiXSS shows the HTML. For example Hello, the text becomes bold. I can find the AntiXSS for Asp.Net core – Philip Rossen May 24 '18 at 08:54
  • AntiXSS has reached end of life. Please read https://stackoverflow.com/questions/37923431/antixss-in-asp-net-core – Praveen Rai May 24 '18 at 08:56
  • I am using the AntiXSS for my asp.net core website without any problems. Try to follow the guide stackoverflow.com/questions/37923431/antixss-in-asp-net-core. I tried to use many libraries for @Html.Raw without any injections. AntiXSS is very fast and works perfectly. – Philip Rossen May 24 '18 at 08:59
  • I'd prefer an official package by .Net Core team, or atleast a project which is being actively developed. I looked at HTML Agility Pack suggested by Stephen as a comment to my question, and it looks pretty good. – Praveen Rai May 24 '18 at 09:04
  • It is a heavy pack. If you only need to display html codes without injection. I would write the codes myself or find a lightweight library for high performance. – Philip Rossen May 24 '18 at 09:07
  • referring to Stephen's answer, i feel HAP is the way to go and i am sticking to it for now. – Praveen Rai May 28 '18 at 06:21
  • I can inject CSS on your website without problems. http://www.thespanner.co.uk/2007/11/26/ultimate-xss-css-injection/. Recommend that you use HAP or another library. – Philip Rossen May 28 '18 at 07:42
1

Assuming your model contains a public string MyHtml { get; set; } property, then to display the results in a view, use

@Html.Raw(Model.MyHtml)

To identify if the posted value contains any <script> tags and/or to remove them from the value, use a html parser such as Html Agility Pack. For example in your POST method, you could add a ModelStateError and return the view

public ActionResult Save(MyModel model)
{
    if (HasScripts(model.MyHtml)
    {
        ModelState.AddModelError("MyHtml", "The html cannot contain script tags");
    }
    if (!ModelState.IsValid)
    {
        return View(model);
    }
    // save and redirect
}

Where HasScripts() is

public bool HasScripts(string html)
{
    HtmlDocument document = new HtmlDocument();
    document.LoadHtml(html);
    HtmlNode root = document.DocumentNode;
    return root.Descendants("script").Any();
}

Alternatively, if you want to just remove them before saving, you could use the following method

public string RemoveScripts(string html)
{
    HtmlDocument document = new HtmlDocument();
    document.LoadHtml(html);
    HtmlNode root = document.DocumentNode;
    IEnumerable<HtmlNode> scripts = root.Descendants("script");
    for(int i = 0; i < scripts.Count(); i++)
    {
        HtmlNode script = scripts[i];
        script.Remove();
    }
    return scripts.Any() ? document.ToString() : html;
}

and use it as

model.MyHtml = RemoveScripts(model.MyHtml);

Note: If you are tempted to use a regex for this, I recommend reading Regular Expression for Extracting Script Tags.

You might also want to consider checking for other potentially malicious elements such as <embed>, <iframe>, <form> etc