Inspired by this StackOverflow question, I'd like to ask the community about the best practice regarding Binary
. When I noticed that my large list of Hyperedge
s (a simple algebraic data structure) would not be streamed but rather loaded at once, I wrote the myGet
and myPut
functions that use chunks of 10000 elements. As stated in the answers to said StackOverflow question, my implementation does not work with Binary >= 0.5. (I missed that.)
However, what I also noticed was that while loading Hyperedge
s with Int
and Int32
in their type parameters, a lot of Word32
s and Word64
s got allocated only to be converted later. That was a massive space leak. I "fixed" it applying deepseq
in get
as early as possible, as you can see. I would have preferred a more refined approach, maybe with Strategy
, but then again, I could not use the Binary
typeclass any more because the strategy would have to be an additional parameter.
Summary: right now, it is a mess. Can you offer "best practice" advice?
instance (NFData v, NFData l, NFData i, B.Binary v, B.Binary l, B.Binary i, Ord v)
=> B.Binary (Hyperedge v l i) where
put e = do
B.put (to e)
B.put (from e)
B.put (label e)
B.put (ident e)
-- get = mkHyperedge <$> B.get <*> B.get <*> B.get <*> B.get
get = do
x1 <- B.get
x2 <- x1 `deepseq` B.get
x3 <- x2 `deepseq` B.get
x4 <- x3 `deepseq` B.get
x4 `deepseq` return (mkHyperedge x1 x2 x3 x4)
myGet
:: (NFData v, NFData l, NFData i, B.Binary v, B.Binary l, B.Binary i, Ord v)
=> B.Get [Hyperedge v l i]
myGet = do
es1 <- B.get
if null es1
then return []
else
do
es2 <- myGet
return (es1 ++ es2)
myPut
:: (NFData v, NFData l, NFData i, B.Binary v, B.Binary l, B.Binary i, Ord v)
=> [Hyperedge v l i] -> B.Put
myPut es@[] = B.put es -- ([] :: [Hyperedge v l i])
myPut es = do
B.put (take 10000 es)
myPut (drop 10000 es)