3

I have 5000 vectors which are held in 5000 files. I need to find their sum. Type DF2 is just a synonym for Vector Double and made to be an instance of Num. So I read and parse all those files to list [IO DF2] and fold it:

getFinal :: IO DF2
getFinal = foldl1' (liftA2 (+)) $ map getDF2 [1..(sdNumber runParameters)]
    where getDF2 i = fmap parseDF2 $ readFile ("DF2/DF2_" ++ show i)

However I get an error:

DF2: DF2/DF2_1022: openFile: resource exhausted (Too many open files)

Google revealed this question to be very common:

However, I didn't get what is the problem with the lazy IO. If it is lazy, then why does it open files before they are needed? I didn't understand either how to adapt the elegant solution by Duncan Coutts to my case.

Community
  • 1
  • 1
Yrogirg
  • 2,301
  • 3
  • 22
  • 33

1 Answers1

6

It's not that it opens files before they're needed; it's that it doesn't close them until you force the entire string. A simple way to work around this problem is to force the entire string immediately after reading it; since Vectors are strict, the simplest way to do this is to force the Vector to be evaluated after parsing it:

getFinal :: IO DF2
getFinal = foldl1' (liftA2 (+)) $ map getDF2 [1..(sdNumber runParameters)]
    where getDF2 i = readFile ("DF2/DF2_" ++ show i) >>= evaluate . parseDF2

This uses Control.Exception.evaluate; you can think of evaluate as forcing its argument and then returning it. This only works if parseDF2 consumes the whole string, however.

A more elegant solution would be to move away from lazy IO entirely, and use iteratees or something of the sort. But that's probably not worth it for such a simple use-case.

ehird
  • 40,602
  • 3
  • 180
  • 182
  • It turns out that the last two strings in your example can be replaced just with `evaluate $ parseDF2 s`, no `length s` required. So I guess there must exist a really compact solution. – Yrogirg Jan 03 '12 at 18:33
  • I believe the two last lines in `getDF2` can be replaced with `return $! parseDF2 s`. At least if the `Vector`s involved are unboxed. – Daniel Fischer Jan 03 '12 at 18:43
  • @Yrogirg: That doesn't quite work: `evaluate` only evaluates one level, to WHNF, and your parser could easily do some of the parsing work in lazy fields of `DF2`. If you used the deepseq package (it's common), then I think you could use `getDF2 i = force . parseDF2 <$> readFile ("DF2/DF2_" ++ show i)`. – ehird Jan 03 '12 at 18:47
  • @DanielFischer: Oh, I missed that it's a Vector. Yes, a simple `evaluate` would work just fine here. I would avoid `return $! x`, since `evaluate` is more "proper" (e.g. ``(return $! undefined) `seq` ()`` is ⊥, but ``evaluate undefined `seq` ()`` is `()`). – ehird Jan 03 '12 at 18:48
  • I assumed that addition of `Vector`s is strict, so there's no semantic difference here. I'm a 'minimal dependencies' guy, so I tend to use `return $! stuff` unless I need `evaluate`'s behaviour, but you're right, using `evaluate` works in more situations. – Daniel Fischer Jan 03 '12 at 19:01
  • @Yrogirg: I've updated my answer to be a lot simpler thanks to the properties of Vector. – ehird Jan 03 '12 at 19:05