9

I am writing a daemon that reads something from a small file, modifies it, and writes it back to the same file. I need to make sure that each file is closed promptly after reading before I try to write to it. I also need to make sure each file is closed promptly after writing, because I might occasionally read from it again right away.

I have looked into using binary-strict instead of binary, but it seems that only provides a strict Get, not a strict Put. Same issue with System.IO.Strict. And from reading the binary-strict documentation, I'm not sure it really solves my problem of ensuring that files are promptly closed. What's the best way to handle this? DeepSeq?

Here's a highly simplified example that will give you an idea of the structure of my application. This example terminates with

*** Exception: test.dat: openBinaryFile: resource busy (file is locked)

for obvious reasons.

import Data.Binary ( Binary, encode, decode )
import Data.ByteString.Lazy as B ( readFile, writeFile )
import Codec.Compression.GZip ( compress, decompress )

encodeAndCompressFile :: Binary a => FilePath -> a -> IO ()
encodeAndCompressFile f = B.writeFile f . compress . encode

decodeAndDecompressFile :: Binary a => FilePath -> IO a
decodeAndDecompressFile f = return . decode . decompress =<< B.readFile f

main = do
  let i = 0 :: Int
  encodeAndCompressFile "test.dat" i
  doStuff

doStuff = do
  i <- decodeAndDecompressFile "test.dat" :: IO Int
  print i
  encodeAndCompressFile "test.dat" (i+1)
  doStuff
Greg Bacon
  • 134,834
  • 32
  • 188
  • 245
mhwombat
  • 8,026
  • 28
  • 53

3 Answers3

11

All 'puts' or 'writes' to files are strict. The act of writeFile demands all Haskell data be evaluated in order to put it on disk.

So what you need to concentrate on is the lazy reading of the input. In your example above you both lazily read the file, then lazily decode it.

Instead, try reading the file strictly (e.g. with strict bytestrings), and you'll be fine.

Don Stewart
  • 137,316
  • 36
  • 365
  • 468
  • I'm confused. I imagine that `i` is initially bound to a thunk in `doStuff`, and that no IO has actually taken place since we used the lazy `readFile`. However, once we `print i`, doesn't that force the evaluation of `i`, and the completion of all the IO? Did `decompress` not read all of the file, so it's left open? – pat May 10 '12 at 02:07
  • Don, your explanation helped me understand why I don't need strict versions of 'puts' or 'writes'; thank you for that. However, I think I would also need to find a strict version of decompress. Ultimately I went with Nathan's solution because I find it a little easier to follow the code. – mhwombat May 10 '12 at 13:11
8

Consider using a package such as conduit, pipes, iteratee or enumerator. They provide much of the benefits of lazy IO (simpler code, potentially smaller memory footprint) without the lazy IO. Here's an example using conduit and cereal:

import Data.Conduit
import Data.Conduit.Binary (sinkFile, sourceFile)
import Data.Conduit.Cereal (sinkGet, sourcePut)
import Data.Conduit.Zlib (gzip, ungzip)
import Data.Serialize (Serialize, get, put)

encodeAndCompressFile :: Serialize a => FilePath -> a -> IO ()
encodeAndCompressFile f v =
  runResourceT $ sourcePut (put v) $$ gzip =$ sinkFile f

decodeAndDecompressFile :: Serialize a => FilePath -> IO a
decodeAndDecompressFile f = do
  val <- runResourceT $ sourceFile f $$ ungzip =$ sinkGet get
  case val of
    Right v  -> return v
    Left err -> fail err

main = do
  let i = 0 :: Int
  encodeAndCompressFile "test.dat" i
  doStuff

doStuff = do
  i <- decodeAndDecompressFile "test.dat" :: IO Int
  print i
  encodeAndCompressFile "test.dat" (i+1)
  doStuff
Nathan Howell
  • 4,627
  • 1
  • 22
  • 30
2

An alternative to using conduits et al. would be to just use System.IO, which will allow you to control explicitly when files are closed with respect to the IO execution order.

You can use openBinaryFile followed by normal reading operations (probably the ones from Data.ByteString) and hClose when you're done with it, or withBinaryFile, which closes the file automatically (but beware this sort of problem).

Whatever the method you use, as Don said, you probably want to read as a strict bytestring and then convert the strict to lazy afterwards with fromChunks.

Community
  • 1
  • 1
Ben Millwood
  • 6,754
  • 24
  • 45