1

I can't really figure out whether some of these other questions are similar enough to mine but I couldn't extract a solution out of them so I'm posting. Feel free to indicate to me otherwise.

I have a flow where I need to download a large CSV file, and 1) save it to disk, and 2) process it. I'd like to use Haskell pipes, with the pipes-http and pipes-csv packages to do this.

The obvious way is to have two separate pipes: 1) web -> disk, and then 2) disk -> process. Is it possible to do another topology where the output from the web splits into two consumers, one that saves and the other that processes? I feel that this could be more elegant and possibly more efficient.

If so, how is the splitting done? Splitting of pipes is not mentioned anywhere in the documentation.

Cœur
  • 37,241
  • 25
  • 195
  • 267

1 Answers1

0

The expression "splitting the content between consumers" might be a little misleading; you want to send all bytes to each of two consumers. But Pipes.Prelude.tee turns any consumer into a Pipe, thus

producer >-> tee consumer1 >-> consumer2

feeds the producer to both of the consumers. But the particular case of writing to a file might be simplest with Pipes.Prelude.chain, rather than a consumer. tee and chain allow you to do something with each incoming value, before forwarding it along the pipeline. In this case I just write each successive chunk to a handle, before passing it along:

import Pipes
import Pipes.HTTP
import qualified Pipes.ByteString as PB 
import qualified Pipes.Prelude as P
import qualified System.IO as IO
import qualified Data.ByteString as B

main = do
    req <- parseUrl "https://www.example.com"
    m <- newManager tlsManagerSettings 
    withHTTP req m $ \resp -> 
      IO.withFile "file.txt" IO.WriteMode $ \h -> 
        runEffect $ responseBody resp >-> P.chain (B.hPut h) >-> PB.stdout

I ended the pipeline with PB.stdout where you would use pipes-csv materials. Using tee, I could as well have written

runEffect $ responseBody resp >-> P.tee (PB.toHandle h) >-> PB.stdout

for the last line. Where the 'consumers' can be viewed as folds, there is the apparatus of Control.Foldl for combining many folds together - and any number of other devices.

Michael
  • 2,889
  • 17
  • 16