0

I struggle with safely encoding html-like text in json. The text should be written into a <textarea>, transferred by ajax to the server (.net45 mvc) and stored in a database in a json-string.

When transferring to server, I get the famous "A potentially dangerous Request.Form value was detected" 500 server error. To avoid this message, I use the [AllowHtml] attribute on the model that are transferred. By doing so I open up for XSS-vulnerability, in case anyone paste in { "key1": "<script>alert(\"danger!\")</script>" }. As such, I would like to use something like

tableData.Json = AntiXssEncoder.HtmlEncode(json, true);

Problem is I cannot do this on the full json string, as it will render something like

{&#13;&#10;&quot;key1&quot: ...}

which of course is not what I want. It should be more like

{ "key1": "&lt;script&gt;alert(&quot;danger!&quot;)&lt;/script&gt;" }

With this result the user can write whatever code they want, but I can avoid it to be rendered as html, and just display it as ordinary text. Does anyone know how to traverse json with C# (Newtonsoft Json.NET) such that strings can be encoded with AntiXssEncoder.HtmlEncode(... , ....);? Or am I on a wrong track here?

Edit:

  1. The data is non-uniform, so deserialization into uniform objects is not an option.
  2. The data will probably be opened to the public, so storing the data encoded would ease my soul.
pekaaw
  • 2,309
  • 19
  • 18
  • Have a look at [Selectively escape HTML in strings during deserialization](http://stackoverflow.com/q/32562381/10263); it sounds similar to your situation. You might be able to adapt the solution to your needs. – Brian Rogers Sep 21 '16 at 13:24

2 Answers2

0

If you already have the data as a JSON string, you could parse it into proper objects with something like Json.NET using JsonConvert.DeserializeObject() (or anything else, there are actually quite a few options to choose from). Once it's plain objects, you can go through them and apply any encoding you want, then serialize them again into a JSON string. You can also have a look at this question and its answers.

Another approach that you may take is just leave it alone until actually inserting stuff into the page DOM. You can store unencoded data in the database, you can even send it to the client without HTML encoding as JSON data (of course it needs to be encoded for JSON, but any serializer does that). You need to be careful not to generate it this way directly into the page source though, but as long as it's an AJAX response with text/json content type, it's fine. Then on the client, when you decide to insert it into the actual textarea, you need to make sure you insert it as text, and not html. Technically this could mean using jQuery's .text() instead of .html(), or your template engine's or client-side data binding solution's relevant method (text: instead of html: in Knockout, #: instead of #= in say Kendo UI, etc.)

The advantage of this is latter approach is that when sending the data, the server (something like an API) does not need to know or care about where or how a client will use the data, it's just data. The client may need different encoding for an HTML or a Javascript context, the server cannot necessarily choose the right one.

If you know it's just that text area though where this data is needed, you can of course take the first (your original) approach, encode it on the server, that's equally good (some may argue that's even better in that scenario).

The problem with answering this question is that details count a lot. In theory, there are a myriad of ways you could do it right, but sometimes a good solution differs from a vulnerable one in one single character.

Community
  • 1
  • 1
Gabor Lengyel
  • 14,129
  • 4
  • 32
  • 59
  • Thank you for the answer. The data is non uniform and might get opened to the public. As such, deserialization would lead to dynamic objects, which leads to a new set of questions on its own. Since others might use the data, I find it better to store them as "safe" data right away. I am currently trying a non-recursive approach to traverse the json-string as a JToken object and encode all values I find where type equals JTokenType.String. – pekaaw Sep 21 '16 at 15:04
0

So this is the solution I went for. I added the [AllowHtml] attribute in the ViewModel, so that I could send raw html from the textarea (through ajax). With this attribute I avoid the System.Web.HttpRequestValidationException that MVC gives to protect against XSS dangers. Then I traverse the json-string by parsing it as a JToken and encode the strings:

public class JsonUtils
{
    public static string HtmlEncodeJTokenStrings(string jsonString)
    {
        var reconstruct = JToken.Parse(jsonString);
        var stack = new Stack<JToken>();
        stack.Push(reconstruct);

        while (stack.Count > 0)
        {
            var item = stack.Pop();
            if (item.Type == JTokenType.String)
            {
                var valueItem = item as JValue;
                if(valueItem == null)
                    continue;

                var value = valueItem.Value<string>();
                valueItem.Value = AntiXssEncoder.HtmlEncode(value, true);
            }

            foreach (var child in item.Children())
            {
                stack.Push(child);
            }
        }
        return reconstruct.ToString();
    }
}

The resulting json-string will still be valid and I store it in DB. Now, when printing it in a View, I can use the strings directly from json in JS. When opening it again in another <textarea> for editing, I have to decode the html entities. For that I "stole" some js-code (decodeHtmlEntities) from string.js; of course adding the licence and credit note.

Hope this helps anyone.

pekaaw
  • 2,309
  • 19
  • 18