Data.Binary best practices -- when to be strict and how

Question

Inspired by this StackOverflow question, I'd like to ask the community about the best practice regarding Binary. When I noticed that my large list of Hyperedges (a simple algebraic data structure) would not be streamed but rather loaded at once, I wrote the myGet and myPut functions that use chunks of 10000 elements. As stated in the answers to said StackOverflow question, my implementation does not work with Binary >= 0.5. (I missed that.)

However, what I also noticed was that while loading Hyperedges with Int and Int32 in their type parameters, a lot of Word32s and Word64s got allocated only to be converted later. That was a massive space leak. I "fixed" it applying deepseq in get as early as possible, as you can see. I would have preferred a more refined approach, maybe with Strategy, but then again, I could not use the Binary typeclass any more because the strategy would have to be an additional parameter.

Summary: right now, it is a mess. Can you offer "best practice" advice?

instance (NFData v, NFData l, NFData i, B.Binary v, B.Binary l, B.Binary i, Ord v)
  => B.Binary (Hyperedge v l i) where
  put e = do
    B.put (to e)
    B.put (from e)
    B.put (label e)
    B.put (ident e)
  -- get = mkHyperedge <$> B.get <*> B.get <*> B.get <*> B.get
  get = do
    x1 <- B.get
    x2 <- x1 `deepseq` B.get
    x3 <- x2 `deepseq` B.get
    x4 <- x3 `deepseq` B.get
    x4 `deepseq` return (mkHyperedge x1 x2 x3 x4)


myGet
  :: (NFData v, NFData l, NFData i, B.Binary v, B.Binary l, B.Binary i, Ord v) 
  => B.Get [Hyperedge v l i]
myGet = do
  es1 <- B.get
  if null es1
    then return []
    else
      do
        es2 <- myGet
        return (es1 ++ es2)

myPut
  :: (NFData v, NFData l, NFData i, B.Binary v, B.Binary l, B.Binary i, Ord v) 
  => [Hyperedge v l i] -> B.Put
myPut es@[] = B.put es -- ([] :: [Hyperedge v l i])
myPut es = do
  B.put (take 10000 es)
  myPut (drop 10000 es)

Data.Binary best practices -- when to be strict and how

0 Answers0