4

This is what happens in a rails console (Rails v4.0.4):

irb(main):020:0> "pepe&pepe <juan>".to_json
=> "\"pepe\\u0026pepe \\u003Cjuan\\u003E\""

This is what happens in a irb console (Ruby 2.0.0p247):

irb(main):014:0> "pepe&pepe <juan>".to_json
=> "\"pepe&pepe <juan>\""

I know I can override this behaviour but my concern is why Rails is doing this by default? which can be the consequences of not doing it?, because for me it looks like a good idea to override this behaviour and not escaping html entities but I'm sure I'm missing something.

fguillen
  • 36,125
  • 23
  • 149
  • 210

1 Answers1

4

JSON is written to HTML contexts - scripts and attributes - a lot in Rails.

This default escaping avoids injection in such cases: characters that have meaning in a particular context and are not escaped pose an injection / XSS risk.1

If, and only if, dealing in a context where such is not the case then it can be safely disabled: the default is simply to favor 'safety'. Since this HTML-safe transformation can be done without breaking any standard and without chaining the JSON-equivalency2 it is what the Rail's ream has done - good for them!3

In particular this avoids nasty 'JSON'2 like:

var x = {"foo": "</script><script>alert('owned')</script>"};

JSON embedded into other HTML constructs, eg. data-attributes, can also be problematic. Even using JSON.parse, which would require an extra encoding step, leaves the same potential issue.


1 The standard safe-encoded output methods apply to HTML PCDATA contexts, but in the case of emitting JSON to a script element (CDATA) this is not desirable and purposefully skipped (eg. with raw).

2 Here is another answer of mine where I wrote about about why such escaping is always valid as well as a caveat of using JSON as a JavaScript Literal. Unlike the notorious and ill-devised 'add slashes', the HTML-safe JSON represents identical information.

3 JavaScriptSerializer from Microsoft and json_encode in PHP has similar default encoding behavior. The default context in which these libraries/functions are used probably plays a large factor on the default HTML-safe configurations.

Community
  • 1
  • 1
user2864740
  • 60,010
  • 15
  • 145
  • 220
  • 1
    I really appreciate and accept the extended answer you have offered. But I can't agree with the arrogance of Rails on this regard. This `.to_json` overwriting was adding issues to a general library that is loaded in the same context of a Rails application. The Rails application decides that the standard Ruby way of parsing objects to _json_ now is gonna change and so it is changing to any library is loaded with Rails :/. I accept overwriting standard methods is good idea and very helpful but I think the original behaviour must to remain as default. – fguillen Aug 31 '15 at 19:51
  • 1
    @fguillen JavaScriptSerializer from Microsoft and json_encode from PHP both follow a similar HTML-safe default encoding. Sine the encoded JSON is equivalent - but not text content equal, *which was never guaranteed* - I don't have a problem with the transformation. In any case, such concerns will be better directed to the appropriate project home / mailing list. – user2864740 Aug 31 '15 at 19:58
  • @fguillen: But `"\u003C"` *is* `"<"` in both JSON and JavaScript. No compliant JSON decoder should care about the difference because there really isn't any difference. Yes, Rails tends to be arrogant, pig-headed, and opinionated but they're really not be that bad this time, they're still producing fully compliant JSON. – mu is too short Aug 31 '15 at 22:02