16

Functions decode and decode' from aeson package are almost identical. But they have subtle difference described in documentation (posting only interesting part of docs here):

-- This function parses immediately, but defers conversion.  See
-- 'json' for details.
decode :: (FromJSON a) => L.ByteString -> Maybe a
decode = decodeWith jsonEOF fromJSON

-- This function parses and performs conversion immediately.  See
-- 'json'' for details.
decode' :: (FromJSON a) => L.ByteString -> Maybe a
decode' = decodeWith jsonEOF' fromJSON

I tried to read description of json and json' functions but still don't understand which one and when I should use because documentation is not clear enough. Can anybody describe more precisely the difference between two functions and provide some example with behavior explanation if possible?

UPDATE:

There are also decodeStrict and decodeStrict' functions. I'm not asking what is difference between decode' and decodeStrict for example which by the way is an interesting question as well. But what's lazy and what's strict here in all these functions is not obvious at all.

Shersh
  • 9,019
  • 3
  • 33
  • 61
  • 2
    This doesn't answer your question, but [there is an open issue on aeson asking about this difference.](https://github.com/bos/aeson/issues/315) – Li-yao Xia Jul 28 '17 at 14:48
  • @Li-yaoXia nice catch! I didn't found that issue. – Shersh Jul 28 '17 at 14:51
  • Looking at the source of `aeson`, it looks like the only distinction is whether or not strings and numbers will be forced all the way (thereby allocating potentially expensive number or string data structures). That said, I haven't been able to trigger this behaviour yet... – Alec Jul 28 '17 at 22:03
  • I think the intention looking at `value` and `value'` is for the strict version to eagerly build any nested objects/arrays. In practice I'm not sure the lazy version avoids much work since the parser would have to check whether any objects/arrays are well formed before moving to the next property, but I think the copying/processing to build the `HashMap`/`Vector` in `objectValues`/`arrayValues` would be avoided/deferred in the lazy version. – ryachza Aug 01 '17 at 19:17

2 Answers2

15

The difference between these two is subtle. There is a difference, but it’s a little complicated. We can start by taking a look at the types.

The Value type

It’s important to note that the Value type that aeson provides has been strict for a very long time (specifically, since version 0.4.0.0). This means that there cannot be any thunks between a constructor of Value and its internal representation. This immediately means that Bool (and, of course, Null) must be completely evaluated once a Value is evaluated to WHNF.

Next, let’s consider String and Number. The String constructor contains a value of type strict Text, so there can’t be any laziness there, either. Similarly, the Number constructor contains a Scientific value, which is internally represented by two strict values. Both String and Number must also be completely evaluated once a Value is evaluated to WHNF.

We can now turn our attention to Object and Array, the only nontrivial datatypes that JSON provides. These are more interesting. Objects are represented in aeson by a lazy HashMap. Lazy HashMaps only evaluate their keys to WHNF, not their values, so the values could very well remain unevaluated thunks. Similarly, Arrays are Vectors, which are not strict in their values, either. Both of these sorts of Values can contain thunks.

With this in mind, we know that, once we have a Value, the only places that decode and decode' may differ is in the production of objects and arrays.

Observational differences

The next thing we can try is to actually evaluate some things in GHCi and see what happens. We’ll start with a bunch of imports and definitions:

:seti -XOverloadedStrings

import Control.Exception
import Control.Monad
import Data.Aeson
import Data.ByteString.Lazy (ByteString)
import Data.List (foldl')
import qualified Data.HashMap.Lazy as M
import qualified Data.Vector as V

:{
forceSpine :: [a] -> IO ()
forceSpine = evaluate . foldl' const ()
:}

Next, let’s actually parse some JSON:

let jsonDocument = "{ \"value\": [1, { \"value\": [2, 3] }] }" :: ByteString

let !parsed = decode jsonDocument :: Maybe Value
let !parsed' = decode' jsonDocument :: Maybe Value
force parsed
force parsed'

Now we have two bindings, parsed and parsed', one of which is parsed with decode and the other with decode'. They are forced to WHNF so we can at least see what they are, but we can use the :sprint command in GHCi to see how much of each value is actually evaluated:

ghci> :sprint parsed
parsed = Just _
ghci> :sprint parsed'
parsed' = Just
            (Object
               (unordered-containers-0.2.8.0:Data.HashMap.Base.Leaf
                  15939318180211476069 (Data.Text.Internal.Text _ 0 5)
                  (Array (Data.Vector.Vector 0 2 _))))

Would you look at that! The version parsed with decode is still unevaluated, but the one parsed with decode' has some data. This leads us to our first meaningful difference between the two: decode' forces its immediate result to WHNF, but decode defers it until it is needed.

Let’s look inside these values to see if we can’t find more differences. What happens once we evaluate those outer objects?

let (Just outerObjValue) = parsed
let (Just outerObjValue') = parsed'
force outerObjValue
force outerObjValue'

ghci> :sprint outerObjValue
outerObjValue = Object
                  (unordered-containers-0.2.8.0:Data.HashMap.Base.Leaf
                     15939318180211476069 (Data.Text.Internal.Text _ 0 5)
                     (Array (Data.Vector.Vector 0 2 _)))

ghci> :sprint outerObjValue'
outerObjValue' = Object
                   (unordered-containers-0.2.8.0:Data.HashMap.Base.Leaf
                      15939318180211476069 (Data.Text.Internal.Text _ 0 5)
                      (Array (Data.Vector.Vector 0 2 _)))

This is pretty obvious. We explicitly forced both of the objects, so they are now both evaluated to hash maps. The real question is whether or not their elements are evaluated.

let (Array outerArr) = outerObj M.! "value"
let (Array outerArr') = outerObj' M.! "value"
let outerArrLst = V.toList outerArr
let outerArrLst' = V.toList outerArr'

forceSpine outerArrLst
forceSpine outerArrLst'

ghci> :sprint outerArrLst
outerArrLst = [_,_]

ghci> :sprint outerArrLst'
outerArrLst' = [Number (Data.Scientific.Scientific 1 0),
                Object
                  (unordered-containers-0.2.8.0:Data.HashMap.Base.Leaf
                     15939318180211476069 (Data.Text.Internal.Text _ 0 5)
                     (Array (Data.Vector.Vector 0 2 _)))]

Another difference! For the array decoded with decode, the values are not forced, but the ones decoded with decode' are. As you can see, this means decode doesn’t actually perform conversion to Haskell values until they are actually needed, which is what the documentation means when it says it “defers conversion”.

Impact

Clearly, these two functions are slightly different, and clearly, decode' is stricter than decode. What’s the meaningful difference, though? When would you prefer one over the other?

Well, it’s worth mentioning that decode never does more work than decode', so decode is probably the right default. Of course, decode' will never do significantly more work than decode, either, since the entire JSON document needs to be parsed before any value can be produced. The only significant difference is that decode avoids allocating Values if only a small part of the JSON document is actually used.

Of course, laziness is not free, either. Being lazy means adding thunks, which can cost space and time. If all of the thunks are going to be evaluated, anyway, then decode is simply wasting memory and runtime adding useless indirection.

In this sense, the situations when you might want to use decode' are situations in which the whole Value structure is going to be forced, anyway, which is probably dependent on which FromJSON instance you’re using. In general, I wouldn’t worry about picking between them unless performance really matters and you’re decoding a lot of JSON or doing JSON decoding in a tight loop. In either case, you should benchmark. Choosing between decode and decode' is a very specific manual optimization, and I would not feel very confident that either would actually improve the runtime characteristics of my program without benchmarks.

Alexis King
  • 43,109
  • 15
  • 131
  • 205
  • I was going to make this same answer but wasn’t certain about it—still, I believe you’re right. My impression is that it can also make a big difference if you’re decoding to something *other* than `Value` that takes up considerably more memory than the `Value` would. For example, say you have an object whose values are run-length–encoded arrays: `{"data1": [10000, 0], "data2": [1, 1]}`. If you only access `data2`, you don’t want to decompress the whole `data1` value. – Jon Purdy Aug 01 '17 at 22:32
  • @JonPurdy I originally thought that, too, and I was going to include something about that, but I realized in truth it’s entirely up to the `FromJSON` instance how lazy it wants to be. Ultimately, both `decode` and `decode'` produce a `Value`, then provide it to `fromJSON`, which subsequently produces whatever the `Parser` produces. I don’t believe strict production of `Value`s would change the strictness properties of the value produced by `fromJSON` in any significant way. – Alexis King Aug 01 '17 at 22:39
  • 1
    I'd expect the laziness to be potentially beneficial only when it (effectively) delays the conversion of a bunch of slices into actual `Text` values. To win by delaying construction of results after parsing, the results have to be quite a bit bigger than the thunks representing them. That's often a tall order. Even worse, delaying construction can lead to space leaks; just one unforced result can easily keep a bytestring chunk live. – dfeuer Aug 01 '17 at 23:58
  • @dfeuer I thought about writing some benchmarks for this, but that was considerably more work than I wanted to put in today. Another answer with raw numbers would be useful, though. – Alexis King Aug 02 '17 at 00:44
  • I've worked with JSON values that are basically an object wrapper with a bunch of meta-data fields, and then one field with a large array of complex objects. It's not uncommon to need to process large collections of these outer objects where only the meta-data is relevant, so I can see this sort of distinction making a significant difference; you still need to recognise the JSON structure to know when the array that you're ignoring ends and start on the next set of metadata, but spending time converting the JSON text representing the array into nested Haskell values would be wasteful. – Ben Aug 02 '17 at 02:55
  • 1
    @AlexisKing Thank you for such detailed explanation and for your effort! Now the difference is crystal clear for me. I wonder, could this somehow go to the documentation of `aeson` package under corresponding issue posted in comments to my question? – Shersh Aug 02 '17 at 09:20
  • AFAICT, `aeson` never used lazy `HashMap`, even in aeson-0.4.0.0 which started to use unordered-containers, the parser used `Data.HashMap.Strict`. So the thunks could hide only inside the `Array` (which sadly isn't printed by `sprint`). – phadej Jun 19 '23 at 13:00
1

Haskell is a lazy language. When you call a function, it doesn't actually execute right then, but instead the information about the call is "remembered" and returned up the stack (this remembered call information is referred to as "thunk" in the docs), and the actual call only happens if somebody up the stack actually tires to do something with the returned value.

This is the default behavior, and this is how json and decode work. But there is a way to "cheat" the laziness and tell the compiler to execute code and evaluate values right then and there. And this is what json' and decode' do.

The tradeoff there is obvious: decode saves computation time in case you never actually do anything with the value, while decode' saves the necessity to "remember" the call information (the "thunk") at the cost of executing everything in place.

Fyodor Soikin
  • 78,590
  • 9
  • 125
  • 172
  • There are also `decodeStrict` and `decodeStrict'` functions which are not the same as `decode` and `decode'`. Things are not so trivial. Not clear what exactly is inside thunk and how it affects performance. – Shersh Jul 28 '17 at 15:48
  • 2
    Yes, I already realized I've misunderstood your question. At first it looked like you just didn't know the difference between strict and lazy, but now after reading some of your comments, I understand that you were looking for a deeper answer. – Fyodor Soikin Jul 28 '17 at 15:52
  • The answer is correct. `decodeStrict` variants are for the strict `ByteString`s – zaquest Jul 31 '17 at 07:41
  • 1
    @zaquest though this answer is technically correct, it is not the answer that the OP was looking for. But thank you for the upvote :-) – Fyodor Soikin Jul 31 '17 at 12:23