2

The streaming-bytestring library gives an error after printing about 512 bytes.

Error:

openBinaryFile: resource exhausted (Too many open files)

Code:

import           Control.Monad.Trans (lift, MonadIO)
import           Control.Monad.Trans.Resource (runResourceT, MonadResource, MonadUnliftIO, ResourceT, liftResourceT)
import qualified Data.ByteString.Streaming          as BSS
import qualified Data.ByteString.Streaming.Char8    as BSSC
import           System.TimeIt

main :: IO ()
main = timeIt $ runResourceT $ dump $ BSS.drop 24 $ BSS.readFile "filename"

dump :: MonadIO m => BSS.ByteString m r -> m ()
dump bs = do
    isEmpty <- BSS.null_ bs
    if isEmpty then return ()
    else do
        BSSC.putStr $ BSS.take 1 bs
        dump $ BSS.drop 1 bs
danidiaz
  • 26,936
  • 4
  • 45
  • 95
paperduck
  • 1,175
  • 1
  • 16
  • 26
  • 2
    I'm not sure whether it'd help someone more knowledgeable answer your question, but, where is `listenNaiveStreaming` coming from? At least it's not on hoogle. Also imo `iterate` would be easier to read with LambdaCase and using the `do` notation instead of manually `>>=`ing everything. – moonGoose May 12 '19 at 21:49
  • I fixed the typo. I am not familiar with LambdaCase; I'll look into it! – paperduck May 12 '19 at 23:26
  • 1
    Be warned, streaming libraries don't generally speed things up. See https://stackoverflow.com/a/55814664/7203016. – K. A. Buhr May 13 '19 at 04:26
  • @K.A.Buhr Thanks, actually that other post was by me too. I am still trying to get this working, but I will take your comment to heart for real projects in the future. – paperduck May 13 '19 at 04:39
  • 1
    My first intuition is that a file is getting opened on every iteration. Can you pull the `readFile "filename"` out and only pass in the file handle? – Bob Dalgleish May 13 '19 at 14:50
  • You could find how you trace the system calls in your environment (e.g. `strace` on Linux and `DTrace` on macOS and some others). Then you will easily find out where the problem come from. Also check out the operating system limits on file descriptors (`ulimit -a` on `bash` shell on POSIXy platforms). You can find out a lot of useful information before inspecting the actual libraries you are using. With the information you can focus where to actually search for the problem. – FooF May 13 '19 at 16:50

1 Answers1

2

When working with streaming libraries, it's usually a bad idea to reuse a effectful stream. That is, you can apply a function like drop or splitAt to a stream and then continue working with the resulting stream, or you can consume the stream as a whole with a function like fold, which leaves you in the base monad. But you should never apply the same stream value to two different functions.

Sadly, the Haskell type system as it stands is not able to enforce that restriction at compile time, it would require some form of linear types. Instead, it becomes the responsibility of the user.

The null_ function is perhaps a wart in the streaming-bytestring api, because it doesn’t return a new stream along with the result, giving the impression that stream reuse is normal throughout the API. It would be better if it had a signature like null_ :: ByteString m r -> m (Bool, ByteString m r).

Similarly, don't use drop and take with the same stream value. Instead, use splitAt or uncons and work with the divided result.

dump :: MonadIO m => BSS.ByteString m r -> m ()
dump bs = do
    mc <- BSSC.uncons bs -- bs is only used once
    case mc of
        Left _ -> return ()
        Right (c,rest) -> do liftIO $ putChar c
                             dump rest

So, about the error. As @BobDalgleish mentions in the comments, what is happening is that the file is opened when null_ is invoked (it is the first time we "demand" something from the stream) . In the recursive call we pass the original bs value again, so it will open the file again, one time for each iteration, until we hit the file handle limit.


Personally, I'm not a fan of using ResourceT with streaming libraries. I prefer opening the file with withFile and then create and consume the stream withing the callback, if possible. But some things are more difficult that way.

danidiaz
  • 26,936
  • 4
  • 45
  • 95
  • `null_` should be seen as completely consuming the stream. Using it again will perform the outermost effect(s) again. There certainly should be a version that produces a new stream that can be used safely. Would you like to open a ticket, or shall I? – dfeuer May 14 '19 at 20:24
  • 1
    (isEmpty :> bs) <- BSSC.testNull bs_ – paperduck May 15 '19 at 08:00
  • 1
    @dfeuer As paperduck mentions, that function already seems to exist! I had not seen it. It could be better documented though. – danidiaz May 15 '19 at 17:51