I am trying to parse a binary file into a haskell vector. I can load my file into a regular list, but since I have more than 10000000 elements for each file, I have terrible performances.
To parse the binary file, I use Data.Binary.Get
and Data.Binary.IEEE754
since I intend to read float values. I am trying to build my vector as Mutable to then return it freezed.
I end up at a where I have a problem because Get
is not an instance of Control.Monad.Primitive.PrimMonad
which looks pretty obscure to me.
import qualified Data.ByteString.Lazy as B
import qualified Data.Vector.Unboxed.Mutable as UM
import qualified Data.Vector.Unboxed as U
import Data.Binary.Get
import Data.Binary.IEEE754
type MyVectorOfFloats = U.Vector Float
main = do
-- Lazyly read the content of the file as a ByteString
file_content <- B.readFile "vec.bin"
-- Parse the bytestring and get the vector
vec <- runGet (readWithGet 10) file_content :: MyVectorOfFloats
-- Do something usefull with it...
return ()
readWithGet :: Int
-> Get MyVectorOfFloats -- ^ Operates in the Get monad
readWithGet n = do
-- Initialize a mutable vector of the desired size
vec <- UM.new n
-- Initialize the vector with values obtained from the Get monad
fill vec 0
-- Finally return freezed version of the vector
U.unsafeFreeze vec
where
fill v i
| i < n = do
-- Hopefully read one fload32 from the Get monad
f <- getFloat32le
-- place the value inside the vector
-- In the real situation, I would do more complex decoding with
-- my float value f
UM.unsafeWrite v i f
-- and go to the next value to read
fill v (i + 1)
| otherwise = return ()
The example above is quite simple, in my situation I have run-length like decoding to do, but the problem stays the same.
First, does the libraries I selected seem adequate for my use ? I currently do not really need the all vector in memory at once. I can operate on chunks. Something from pipes or Conduit looks like interesting.
Do I have to make Get
an instance of Control.Monad.Primitive.PrimMonad
to do what I want ?
I think I could try to do some unfolding pattern to build the vector without mutable state.