I want to add Haskell to my toolbox so I'm working my way through Real World Haskell.
In the chapter in Input and Output, in the section on hGetContents
, I came across this example:
import System.IO
import Data.Char(toUpper)
main :: IO ()
main = do
inh <- openFile "input.txt" ReadMode
outh <- openFile "output.txt" WriteMode
inpStr <- hGetContents inh
let result = processData inpStr
hPutStr outh result
hClose inh
hClose outh
processData :: String -> String
processData = map toUpper
Following this code sample, the authors go on to say:
Notice that
hGetContents
handled all of the reading for us. Also, take a look atprocessData
. It's a pure function since it has no side effects and always returns the same result each time it is called. It has no need to know—and no way to tell—that its input is being read lazily from a file in this case. It can work perfectly well with a 20-character literal or a 500GB data dump on disk. (N.B. Emphasis is mine)
My question is: how does hGetContents
or its resultant values achieve this memory efficiency without – in this example – processData
"being able to tell", and still maintain all benefits that accrue to pure code (i.e. processData
), specifically memoization?
<- hGetContents inh
returns a string so inpStr
is bound to a value of type String
, which is exactly the type that processData
accepts. But if I understand the authors of Real World Haskell correctly, then this string isn't quite like other strings, in that it's not fully loaded into memory (or fully evaluated, if such a things as not-fully-evaluated strings exists...) by the time of the call to processData
.
Therefore, another way to ask my question is: if inpStr
is not fully evaluated or loaded into memory at the time of the call to processData
, then how can it be used to lookup if a memoized call to processData
exists, without first fully evaluating inpStr
?
Are there instances of type String
that each behave differently but cannot be told apart at this level of abstraction?