I/O with String
is known to be less than fast in Haskell. The bytes read from the handle in general have to be converted to Unicode code points, and then a linked list is built from those. That's a lot of work causing a lot of allocation. In this case, the conversion to code points is a bit simpler, since you set stdin to binary mode, but the construction of the linked list of characters still takes a lot of time.
Another small factor is that your line count is using Integer
, but that's minor and only plays a significant role when the I/O is up to speed.
If you need fast I/O, you have to use a type better suited for that. One possibility is using ByteString
, for example
import Data.List
import qualified Data.ByteString.Lazy.Char8 as C
main = do
txt <- C.getContents
putStrLn $ (\(w,l)->"line:"++(show l)++"\nwords:"++(show w)++"\n"). foldl' (\(w,l) r-> w `seq` l `seq` (w+C.length r ,succ l) ) (0,0) . C.lines $ txt
does the job on a 94MB file in 0.12s on my box (wc -l -c takes 0.06s), while the original using String
took 4.4s. It can be optimised further,
{-# LANGUAGE BangPatterns #-}
import Data.List
import qualified Data.ByteString.Lazy.Char8 as C
main = do
txt <- C.getContents
putStrLn $ (\(w,l)->"line:"++(show l)++"\nwords:"++(show w)++"\n"). loop 0 0 . C.lines $ txt
loop :: Int -> Int -> [C.ByteString] -> (Int,Int)
loop !w !l (ln:lns) = loop (w + fromIntegral (C.length ln)) (l+1) lns
loop w l _ = (w,l)
takes only 0.08s, which is decent enough for me to stop optimising there (a similar change for the String
version brings the time down to 3.6s for that).