0

I have two versions of a Haskell program which counts the occurrences of each word inside a .txt file.

The first one is:

import Data.HashMap.Strict (empty, insertWith, toList)
import Data.Text (pack, toLower, filter, words)
import System.IO

wordcount = withFile "input.txt" ReadMode $ \handle -> do
    content <- hGetContents handle
    print $ toList $ foldr
        (\x v -> insertWith (+) x 1 v) 
        empty 
        (fmap Data.Text.toLower 
            $ fmap (Data.Text.filter isAlphaNum)
            $ (Data.Text.words . pack) content)

The second one exploits Conduit library:

import Data.HashMap.Strict (empty, insertWith, toList)
import Data.Char (isAlphaNum, toLower)
import Conduit
import qualified Data.Conduit.Combinators as CC

wordcountC = do 
    hashMap <- runConduitRes $ sourceFile "input.txt"
        .| decodeUtf8C
        .| omapCE Data.Char.toLower
        .| CC.splitOnUnboundedE (not . isAlphaNum)
        .| foldMC insertInHashMap empty
    print (toList hashMap)

Running each function several times with a large input file (approx. 80 MB), I measured that there isn't difference between the execution times of the two versions (approximately 13 seconds for both the "standard" version and the Conduit one). Shouldn't the second version benefit from Conduit's stream processing paradigm, therefore resulting in smaller execution time?

  • 3
    You might find [this answer](https://stackoverflow.com/a/55814664/7203016) helpful. The point of Conduit isn't to improve performance (and in fact lazy I/O will generally be faster). – K. A. Buhr Jun 05 '19 at 03:22
  • I have closed against the question suggested by @K.A.Buhr because the accepted answer there appears to fully address your question. Ping me if you feel that isn't the case. – duplode Jun 05 '19 at 21:56

0 Answers0