29

This program produces the output I expect when given an input file of text delimited by \n:

import System.IO

main :: IO ()
main = do h <- openFile "test.txt" ReadMode 
          xs <- getlines h
          sequence_ $ map putStrLn xs

getlines :: Handle -> IO [String]
getlines h = hGetContents h >>= return . lines

By substituting withFile for openFile and rearranging slightly

import System.IO

main :: IO ()
main = do xs <- withFile "test.txt" ReadMode getlines
          sequence_ $ map putStrLn xs

getlines :: Handle -> IO [String]
getlines h = hGetContents h >>= return . lines  

I manage to get no output at all. I'm stumped.

Edit: Not stumped anymore: thanks to one and all for the thoughtful and thought-provoking answers. I did a little more reading in the documentation and learned that withFile can be understood as a partial application of bracket.

This is what I ended up with:

import System.IO

main :: IO ()
main = withFile "test.txt" ReadMode $ \h -> getlines h >>= mapM_ putStrLn 

getlines :: Handle -> IO [String]
getlines h = lines `fmap` hGetContents h
rickythesk8r
  • 636
  • 5
  • 11
  • 8
    Irrelevant tip to hold you over while I look for an answer: `sequence_ . map` can be easier written as `mapM_`. – So8res Feb 23 '12 at 03:37
  • 9
    Another irrelevant tip: `foo >>= return . bar` is better written `fmap bar foo`; I particularly enjoy the infix fmap synonym for this: `bar <$> foo` (requires `import Control.Applicative`) – Dan Burton Feb 23 '12 at 22:59

5 Answers5

31

The file is being closed too early. From the documentation:

The handle will be closed on exit from withFile

This means the file will be closed as soon as the withFile function returns.

Because hGetContents and friends are lazy, it won't try to read the file until it is forced with putStrLn, but by then, withFile would have closed the file already.

To solve the problem, pass the whole thing to withFile:

main = withFile "test.txt" ReadMode $ \handle -> do
           xs <- getlines handle
           sequence_ $ map putStrLn xs

This works because by the time withFile gets around to closing the file, you would have already printed it.

Lambda Fairy
  • 13,814
  • 7
  • 42
  • 68
12

Ugh, did no one ever give the simple solution?

main :: IO ()
main = do xs <- fmap lines $ readFile "test.txt"
          mapM_ putStrLn xs

Don't use openFile+hGetContents or withFile+hGetContents when you can just use readFile. With readFile you can't shoot yourself in the foot by closing the file too early.

Reid Barton
  • 14,951
  • 3
  • 39
  • 49
  • 1
    Nor, indeed, can you close the file at all :) if you're not careful you'll run out of file handles this way. Anyway, if you wanted to do anything more advanced (like seeking, or setting buffer mode, or whatever) then `withFile` is the way to go. – Ben Millwood Oct 25 '12 at 20:45
  • 5
    I don't believe in solving a more general problem than the asker asked for. There's no limit to how much more complicated you can make a task, so that way lies madness. Just solve the task in front of you. There's no indication here that the OP wanted to seek or open the file with special modes or open more than one file. `readFile` is a very convenient every-day sort of function, and I often see people roll their own convoluted code to read files, perhaps because they are accustomed to languages that do not provide a single-function equivalent to `readFile`. – Reid Barton Oct 26 '12 at 06:57
  • But the question wasn't "how do I read files?" it was "how do I use `withFile`?" or more specifically "why is `withFile` behaving in this surprising way?". I think it's silly to expect such a question to *also* come with a detailed explanation of *why* the questioner needs to use `withFile`. There are plenty of good reasons. – Ben Millwood Oct 26 '12 at 12:41
  • 2
    The other answers address "why does my program behave in this surprising way", which is all well and good. Then they all go on to reimplement the OP's program in various convoluted ways. I just found it a strange omission that no one mentioned the simple 2-liner way to do what the OP wants to do. – Reid Barton Oct 26 '12 at 19:22
  • I'm confused. Does calling `readFile` this way leave an open file handle? The appeal of `withFile` seems to be that it's going to clean up all resource allocated at the end of the block. – James McMahon Apr 23 '17 at 21:21
  • @James the file handle is closed when all the contents have been read. See [`hGetContents`](http://hackage.haskell.org/package/base-4.12.0.0/docs/GHC-IO-Handle.html#v:hGetContents), which `readFile` is based on. – Lambda Fairy Dec 24 '18 at 06:11
7

They do completely different things. openFile opens a file and returns a file handle:

openFile :: FilePath -> IOMode -> IO Handle

withFile is used to wrap an IO computation that takes a file handle, ensuring that the handle is closed afterwards:

withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r

In your case, using withFile would look like this:

main = withFile "test.txt" ReadMode $ \h -> do
      xs <- getlines h
      sequence_ $ map putStrLn xs

The version you currently have will open the file, call getlines, then close the file. Since getlines is lazy, it won't get read any output while the file is open, and once the file is closed, it can't.

porges
  • 30,133
  • 4
  • 83
  • 114
6

You're running into the usual hurdles of lazy IO... lazy IO sounds like an excellent idea, making streaming a snap until you start getting those awful problems.

Not that your particular case wouldn't be a red herring to an experienced Haskeller : this is the textbook example of why lazy IO is a problem.

main = do xs <- withFile "test.txt" ReadMode getlines
          sequence_ $ map putStrLn xs

withFile takes a FilePath, a mode and an action to do with the handle resulting from opening this filepath with this mode. The interesting part from withFile is that it is implemented with bracket and guarantee, even in case of exception, than the file will be closed after the action on the handle executes. The problem here is that the action in question (getLines) don't read the file at all ! It only promise to do so when the content is really needed ! This is lazy IO (implemented with the unsafeInterleaveIO, guess what the "unsafe" part means...). Of course by the time this content is needed (putStrLn), the handle was closed by withFile as promised.

So you have several solutions : You could use open and close explicitly (and relinquish exception-safety), or you could use lazy IO but put every action touching the file content in the scope protected by withFile :

main = withFile "test.txt" ReadMode$ \h -> do
         xs <- getlines h
         mapM_ putStrLn xs

In this case, this is not too awful, but you should see that the problem may become more annoying, if you ignore when the content will be needed. Lazy IO in a big and complex program may fast become pretty annoying, and when further limitation on opened handle numbers starts to matter... Which is why the new sport of the Haskell community is to come up with solutions to the problem of streaming content (instead of reading the whole files in memory which "solve" the problem at the cost of bloating memory use to sometimes impossible levels) without lazy IO. For a time it seemed as if Iteratee was going to become the standard solution, but it was very complex and hard to understand, even for experienced Haskeller, so other candidates have crept up lately : the most promising or at least successful at present seems to be "conduit".

tcamps
  • 143
  • 5
Jedai
  • 1,477
  • 9
  • 13
  • 3
    See also: [What's so bad about lazy IO?](http://stackoverflow.com/questions/5892653/whats-so-bad-about-lazy-i-o) – Dan Burton Feb 23 '12 at 23:06
4

As others have noted, hGetContents is lazy. You can, however, add strictness if you so desire:

import Control.DeepSeq

forceM :: (NFData a, Monad m) => m a -> m a
forceM m = do
  val <- m
  return $!! val

main = do xs <- withFile "text.txt" ReadMode (forceM . getlines)
          ...

Though it is generally recommended that you perform all IO related to the contents of the file inside of the withFile block instead. That way, your program can actually take advantage of the lazy file read, keeping only as much as necessary in memory. If you are dealing with a very large file, then forcing the whole file to be read into memory is usually a bad idea.

If you need more fine-grained control of resources, then you should look into using ResourceT (which comes with the conduit package) or similar.

[edit: use $!! from Control.DeepSeq (instead of $!) to make sure the whole value is forced. Thanks for the tip, @benmachine]

Dan Burton
  • 53,238
  • 27
  • 117
  • 198
  • Did you check that this actually works? I see two potential problems: 1. not actually a problem necessarily, but it's way easier to reason about when things are evaluated with respect to IO if you use `Control.Exception.evaluate` rather than your `$!`, and 2. a more serious problem, `$!` will only evaluate up to the first constructor of val, so you'll only fetch one line; you really want something more substantial like a deepSeq, or better still non-lazy IO. Admittedly I didn't check it /doesn't/ work, but I suspect it :) – Ben Millwood Mar 01 '12 at 14:08
  • @benmachine I did check (with the rest of the code given by OP), and it does actually work (I checked it on a small test file with 5 lines). It also works using `(evaluate <=< getlines)` in place of (`forceM . getlines)`. However, now that you mention it, it does seem odd: why didn't it just grab the first line and quit after that? – Dan Burton Mar 01 '12 at 14:33
  • @benmachine your suspicions are correct for large files; `bash> wc -l test.txt` = `54730 test.txt`. `bash> runhaskell lazyio.hs | wc -l` = `228 `. Strange, I thought `LineBuffering` was the default. – Dan Burton Mar 01 '12 at 14:37
  • Tested with DeepSeq's `$!!`: `bash> runhaskell lazyio.hs | wc -l` = `54731` – Dan Burton Mar 01 '12 at 14:45
  • IIRC, Unix defaults to line buffering for terminals but block buffering for files. – Lambda Fairy Nov 30 '14 at 21:50