4

I'm learning Haskell and I need to compare two files. I did not find a function that does this, so I coded it myself. Below is the function I came up with.

cmpFiles :: FilePath -> FilePath -> IO Bool
cmpFiles a b = withBinaryFile a ReadMode $ \ha ->
               withBinaryFile b ReadMode $ \hb ->
                 fix (\loop -> do
                   isEofA <- hIsEOF ha
                   isEofB <- hIsEOF hb

                   if | isEofA && isEofB -> return True             -- both files reached EOF
                      | isEofA || isEofB -> return False            -- only one reached EOF
                      | otherwise        -> do                      -- read content
                                              x <- hGet ha 4028     -- TODO: How to use a constant?
                                              y <- hGet hb 4028     -- TODO: How to use a constant?
                                              if x /= y
                                                then return False   -- different content
                                                else loop           -- same content, contunue...
                 )

My questions are:

  1. Is this code idiomatic? It looks very imperative rather than functional.
  2. Is this code efficient (Layz IO issues with big files, performance...)?
  3. Is there a more compact way to write it?
Chris Stryczynski
  • 30,145
  • 48
  • 175
  • 286
And Pos
  • 147
  • 4
  • 1
    It is not clear whether you use lazy or strict bytestrings (always show your imports!) [This](http://pastebin.com/E9U80KcD) is idiomatic lazy IO to me. I think your program will only work with strict bytestrings, `withBinaryFile` and lazy IO don't mix well. – n. m. could be an AI May 20 '16 at 18:46
  • 1
    Your function is too big. One simple step would be to write a separate function that takes two *handles*. `fix` is rarely necessary or helpful for writing clear code; explicit recursion is usually better. – dfeuer May 20 '16 at 19:21

3 Answers3

6

How about

cmpFiles a b = do
    aContents <- readFile a
    bContents <- readFile b
    return (aContents == bContents)
Reid Barton
  • 14,951
  • 3
  • 39
  • 49
  • 2
    You might want to specify that this should be the `readFile` in `Data.ByteString.Lazy`.... Both will work, but the usual one might choke memory for large files. The `String` version might be confused by the binary data. – jamshidh May 20 '16 at 19:40
  • 2
    What do you think of the color of this bikeshed? ```cmpFiles = liftA2 (==) `on` readFile``` – Daniel Wagner May 20 '16 at 20:34
0

You can even make a one-liner out of it:

cmpFiles a b = liftM2 (==) (readFile a) (readFile b)

This one is actually equivalent to Reid Barton's solution. Equivalent is not a weasel word here, if you take the definition of liftM2 from hackage

liftM2 f m1 m2 = do { x1 <- m1; x2 <- m2; return (f x1 x2) }

and insert (==) and the readFiles you are there immediately.

Laziness is your friend in haskell. The documentation of readFile states that the input is read lazily, i.e. only on demand. == is lazy, too. Thus the entire liftM22 ... reads the files only until it finds a difference.

Franky
  • 2,339
  • 2
  • 18
  • 27
  • Laziness is your friend -- but the important part of this solution isn't laziness, it's the badly-named lazy IO. Lazy IO is your friend, too, [but only sometimes](http://stackoverflow.com/a/6669453/791604). – Daniel Wagner May 20 '16 at 20:37
  • I digged into the code of (==) and it does seem to be comparing chunk of bytes, so I guess it is lazy indeed and therefore my worries of the operation to be loading the entire file before comparing were not correct. – And Pos May 31 '16 at 19:21
0
cmpFiles a b = (==) <$> readFile a <*> readFile b