3

I am trying to run a Parsec parser over a whole bunch of small files, and getting an error saying I have too many open files. I understand that I need to use strict IO, but I'm not sure how to do that. This is the problematic code:

files = getDirectoryContents historyFolder

hands :: IO [Either ParseError [Hand]]
hands = join $ sequence <$> parseFromFile (many hand) <<$>> files

Note: my <<$>> function is this:

(<<$>>) :: (Functor f1, Functor f2) => (a -> b) -> f1 (f2 a) -> f1 (f2 b)
a <<$>> b = (a <$>) <$> b
Drew
  • 12,578
  • 11
  • 58
  • 98
  • The problem is that `parseFromFile` is too lazy, that is the point I would suggest to change, for that you would have to include it. Besides, using the `pipes` or `conduit` package might be a good idea – Markus1189 Apr 06 '14 at 11:55

2 Answers2

6

I don't know what your parseFromFile function looks like right now (probably a good idea to include that in the question), but I'm guessing you're using Prelude.readFile, which as @Markus1189 points out includes lazy I/O. To get to strict I/O, you just need a strict readFile, such as Data.Text.IO.readFile.

A streaming data library like pipes or conduit would allow you to avoid reading the entire file into memory at once, though- to my knowledge- parsec doesn't provide a streaming interface to allow this to happen. attoparsec, on the other hand, does include such a streaming interface, and both pipes and conduit have attoparsec adapter libraries (e.g., Data.Conduit.Attoparsec).

tl;dr: You probably just need the following helper function:

import qualified Data.Text as T
import qualified Data.Text.IO as TIO

readFileStrict :: FilePath -> IO String
readFileStrict = fmap T.unpack . TIO.readFile
Michael Snoyman
  • 31,100
  • 3
  • 48
  • 77
  • `parseFromFile` is part of parsec, and can be either a [lazy read](http://hackage.haskell.org/package/parsec-3.1.5/docs/Text-Parsec-ByteString-Lazy.html) with `readFile` from `ByteString.Lazy` or the more [strict variant](http://hackage.haskell.org/package/parsec-3.1.5/docs/Text-Parsec-ByteString.html) with `bracket openBinaryFile ... hClose`. – Zeta Apr 06 '14 at 13:07
0

You can use the BangPatterns language extension to enforce strictness of your IO operations, in this case parseFromFile. For example the function hands can be changed in:

hands :: [String] → IO [Either ParseError [Hand]]
hands [] = return []
hands (f:fs) = do
  !res ← parseFromFile hand f
  others ← hands fs
  return (res:others)

This version of hands waits for the results of each call of parseFromFile before moving to the next file in the list. Once you have this, the problem should disappear. A full working toy example is:

{-# LANGUAGE BangPatterns #-}
import Control.Monad
import Control.Applicative hiding (many)
import Data.Char (isDigit)
import System.Directory (getDirectoryContents)
import System.FilePath ((</>))
import Text.ParserCombinators.Parsec

data Hand = Hand Int deriving Show

hand :: GenParser Char st [Hand]
hand = do
  string "I'm file "
  num ← many digit
  newline
  eof
  return [Hand $ read num]

files :: IO [String]
files = map ("manyfiles" </>)
      ∘ filter (all isDigit) <$> getDirectoryContents "manyfiles"

hands :: [String] → IO [Either ParseError [Hand]]
hands [] = return []
hands (f:fs) = do
  !res ← parseFromFile hand f
  others ← hands fs
  return (res:others)

main :: IO 
main = do
  results ← files >≥ hands
  print results
mariop
  • 3,195
  • 1
  • 19
  • 29