7

Are there any suggestions about how to download large files in Haskell? I figure Http.Conduit is is the library is a good library for this. However, how does it solve this? There is an example in its documentation but it is not fit for downloading large files, it just downloads a file:

 import Data.Conduit.Binary (sinkFile)
 import Network.HTTP.Conduit
 import qualified Data.Conduit as C

 main :: IO ()
 main = do
      request <- parseUrl "http://google.com/"
      withManager $ \manager -> do
          response <- http request manager
          responseBody response C.$$+- sinkFile "google.html"

What I want is be able to download large files and not run out of RAM, e.g. do it effectively in terms of performance, etc. Preferably, being able to continue downloading them "later", meaning "some part now, another part later".

I also found the download-curl package on hackage, but I'm not positive this is a good fit, or even that it downloads files chunk by chunk like I need.

stites
  • 4,903
  • 5
  • 32
  • 43
Incerteza
  • 32,326
  • 47
  • 154
  • 261

2 Answers2

12

Network.HTTP.Conduit provides three functions for performing a request:

Out of the three functions, the first two functions will make the entire response body to live in memory. If you want to operate in constant memory, then use http function. The http function gives you access to a streaming interface through ResumableSource

The example you have provided in your code uses interleaved IO to write the response body to a file in constant memory space. So, you will not run out of memory when downloading a large file.

Sibi
  • 47,472
  • 16
  • 95
  • 163
  • but `withManager` is not `http` function like you said. Does it read a file chunk by chunk? – Incerteza Jul 13 '14 at 07:20
  • @AlexanderSupertramp `withManager` has got nothing to do with reading a file. It just keeps tracks of open connections. – Sibi Jul 13 '14 at 07:33
3

This works for me:

import           Control.Monad.Trans.Resource (runResourceT)
import           Data.Conduit.Combinators     (sinkFile)
import           Network.HTTP.Conduit         (parseRequest)
import           Network.HTTP.Simple          (httpSink)


downloadFile :: String -> IO ()
downloadFile url = do
  request <- parseRequest url
  runResourceT $ httpSink request $ \_ -> sinkFile "tmpfile"

I agree that it's a bit weird that it takes four different modules(and from 3 packages: conduit, resourcet and http-conduit) for such a task.

daydaynatation
  • 550
  • 2
  • 8
  • I think the last line can be `runConduitRes $ httpSource request getResponseBody .| sinkFile filename`, and the entire thing will only need two imports (`Conduit` and `Network.HTTP.Simple`). – orthocresol May 27 '21 at 14:09