6

Why can Ruby's built-in JSON not deserialize simple JSON primitives, and how do I work around it?

irb(main):001:0> require 'json'
#=> true

irb(main):002:0> objects = [ {}, [], 42, "", true, nil ]
#=> [{}, [], 42, "", true]

irb(main):012:0> objects.each do |o|
irb(main):013:1*   json = o.to_json
irb(main):014:1>   begin
irb(main):015:2*     p JSON.parse(json)
irb(main):016:2>   rescue Exception => e
irb(main):017:2>     puts "Error parsing #{json.inspect}: #{e}"
irb(main):018:2>   end
irb(main):019:1> end
{}
[]
Error parsing "42": 706: unexpected token at '42'
Error parsing "\"\"": 706: unexpected token at '""'
Error parsing "true": 706: unexpected token at 'true'
Error parsing "null": 706: unexpected token at 'null'
#=> [{}, [], 42, "", true, nil]

irb(main):020:0> RUBY_DESCRIPTION
#=> "ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-darwin10.7.0]"
irb(main):022:0> JSON::VERSION
#=> "1.4.2"
Phrogz
  • 296,393
  • 112
  • 651
  • 745

5 Answers5

9

RFC 4627: The application/json Media Type for JavaScript Object Notation (JSON) has this to say:

2.  JSON Grammar

   A JSON text is a sequence of tokens.  The set of tokens includes six
   structural characters, strings, numbers, and three literal names.

   A JSON text is a serialized object or array.

      JSON-text = object / array

[...]

2.1.  Values

   A JSON value MUST be an object, array, number, or string, or one of
   the following three literal names:

      false null true

If you call to_json on your six sample objects, we get this:

>> objects = [ {}, [], 42, "", true, nil ]
>> objects.map { |o| puts o.to_json }
{}
[]
42
""
true
null

So the first and second are valid JSON texts whereas the last four are not valid JSON texts even though they are valid JSON values.

JSON.parse wants what it calls a JSON document:

Parse the JSON document source into a Ruby data structure and return it.

Perhaps JSON document is the library's term for what RFC 4627 calls a JSON text. If so, then raising an exception is a reasonable response to an invalid input.

If you forcibly wrap and unwrap everything:

objects.each do |o|
    json = o.to_json 
    begin
        json_text = '[' + json + ']'
        p JSON.parse(json_text)[0]
    rescue Exception => e 
        puts "Error parsing #{json.inspect}: #{e}"    
    end    
end

And as you note in your comment, using an array as the wrapper is better than an object in case the caller wants to use the :symbolize_names option. Wrapping like this means that you'll always be feeding JSON.parse a JSON text and everything should be fine.

mu is too short
  • 426,620
  • 70
  • 833
  • 800
  • 1
    I would suggest using an array wrapper instead of an object wrapper, in case the user wants to pass in `symbolize_names:true` to `JSON.parse`. With the array the method for unwrapping the result is unaffected. – Phrogz Oct 23 '11 at 04:29
  • BTW, great answer, citing the RFC. While the argument you put forth makes it sound reasonable, it is not (IMHO) reasonable for `JSON.parse(o.to_json) != o` for simply serializable values. – Phrogz Oct 23 '11 at 04:32
  • @Phrogz: Excellent point, I've patched my answer accordingly. I'd agree that failing to parse non-object/array values is a little pedantic, especially in the DWIM Ruby world and especially again for not having such behavior documented. – mu is too short Oct 23 '11 at 04:35
  • 1
    Note [this question](http://stackoverflow.com/q/19569221/405017) and its answer, which means that even though RFC 4627 restricts "JSON text" to object or array, the official JSON standard ECMA 404 **does not have this restriction**. I am now convinced that this is a bug in the Ruby JSON library (borne, reasonably, from previously in-progress specifications) and will be pushing to have it fixed. – Phrogz Dec 03 '15 at 20:38
  • 1
    @Phrogz: I think we need at least two more JSON standards to choose from. And then maybe a handful of slightly incompatible implementations just to keep everything interesting. – mu is too short Dec 03 '15 at 20:43
2

This is quite an old question but I think it worths to have a proper answer to prevent hair loss for the ones who just encountered with the problem and still searching for a solution :)

To be able to parse "JSON primitives" with JSON gem below version 2, you can pass quirks_mode: true option like so;

JSON::VERSION # => 1.8.6

json_text = "This is a json primitive".to_json
JSON.parse(json_text, quirks_mode: true)

With the JSON gem version greater or equals to 2, the quirks_mode is not necessary anymore.

JSON::VERSION # => 2.0.0

json_text = "This is a json primitive".to_json
JSON.parse(json_text)

Before parsing the JSON, you can check the version of the JSON gem that you are using in your project with bundle show json or gem list | grep json and then use the corresponding one.

Happy JSON parsing!

Foo Bar Zoo
  • 206
  • 3
  • 10
1

Use JSON.load instead of JSON.parse to handle primitives:

e.g.

JSON.load('true') # => true
JSON.load('false') # => false
JSON.load('5150') # => 5150
JSON.load('null') # => nil
JellicleCat
  • 28,480
  • 24
  • 109
  • 162
1

It appears that the built-in JSON parser intentionally fails on anything but objects and arrays. My current workaround is the following:

# Work around a flaw in Ruby's built-in JSON parser
# not accepting anything but an object or array at the root level.
module JSON
  def self.parse_any(str,opts={})
    parse("[#{str}]",opts).first
  end
end
Phrogz
  • 296,393
  • 112
  • 651
  • 745
  • A fair kludge but I think wrapping and unwrapping would be safer that format guessing. – mu is too short Oct 23 '11 at 01:47
  • @muistooshort An excellent point; there's likely just about no performance hit to always wrapping and unwrapping. Any reason you prefer wrapping in an object versus array? – Phrogz Oct 23 '11 at 01:58
  • 1
    Might be worth some benchmarking but I doubt you'd be able to detect the difference without using small bits of JSON and tens of thousands of iterations. – mu is too short Oct 23 '11 at 02:01
-1

I think you are right...whether it is a bug or not, there is some wonky logic going on with the implementation. If it can parse arrays, and hashes it should be able to parse everything else.

Because JSON.parse seems geared for objects and arrays, I would try to pass your data one of those ways if you can, and if you can't, stick with the workaround you have.

cgr
  • 1,093
  • 1
  • 8
  • 14