Reimplementing getContents using getChar

Question

On my journing towards grasping lazy IO in Haskell I tried the following:

main = do
  chars <- getContents
  consume chars

consume :: [Char] -> IO ()
consume [] = return ()
consume ('x':_) = consume []
consume (c : rest) = do
  putChar c
  consume rest

which just echos all characters typed in stdin until I hit 'x'.

So, I naively thought it should be possible to reimplement getContents using getChar doing something along the following lines:

myGetContents :: IO [Char]
myGetContents = do
  c <- getChar
  -- And now?
  return (c: ???)

Turns out it's not so simple since the ??? would require a function of type IO [Char] -> [Char] which would - I think - break the whole idea of the IO monad.

Checking the implementation of getContents (or rather hGetContents) reveals a whole sausage factory of dirty IO stuff. Is my assumption correct that myGetContents cannot be implemented without using dirty, ie monad-breaking, code?

score 6 · Accepted Answer · answered Dec 03 '16 at 18:16

6

You need a new primitive unsafeInterleaveIO :: IO a -> IO a that delays the execution of its argument action until the result of that action would be evaluated. Then

myGetContents :: IO [Char]
myGetContents = do
  c <- getChar
  rest <- unsafeInterleaveIO myGetContents
  return (c : rest)

answered Dec 03 '16 at 18:16

Reid Barton

14,951
3
39
49

3

Works like a charm :) Remains to add that `unsafeInterleaveIO` must be imported form `System.IO.Unsafe`. – johanneslink Dec 03 '16 at 18:55

score 1 · Answer 2 · answered Dec 04 '16 at 00:20

1

You should really avoid using anything in System.IO.Unsafe if at all possible. They tend to kill referential transparency and are not common functions used in Haskell unless absolutely necessary.

If you change your type signature a little I suspect you can get a more idiomatic approach to your problem.

consume :: Char -> Bool
consume 'x' = False
consume _   = True

main :: IO ()
main = loop
  where
    loop = do
      c <- getChar
      if consume c
      then do
        putChar c
        loop
      else return ()

answered Dec 04 '16 at 00:20

bojo

1,407
13
15

The advice in the first paragraph makes sense in most occasions, but in this case it doesn't apply because the OP is explicitly and intentionally trying to re-implement lazy I/O. – duplode Dec 04 '16 at 01:11
For the record my answer was based on a twitter exchange with the OP which is not reflected in the original question. I got the impression they may not have explicitly been after "lazy IO" and just wanted to get their code to run. Given the spirit of the actual question though, this is indeed a wrong answer. – bojo Dec 05 '16 at 01:07

score 0 · Answer 3 · answered Feb 17 '18 at 22:19

You can do this without any hacks.

If your goal is simply to read all of stdin into a String, you don't need any of the unsafe* functions.

IO is a Monad, and a Monad is an Applicative Functor. A Functor is defined by the function fmap, whose signature is:

fmap :: Functor f => (a -> b) -> f a -> f b

that satisfies these two laws:

fmap id = id
fmap (f . g) = fmap f . fmap g

Effectively, fmap applies a function to wrapped values.

Given a specific character 'c', what is the type of fmap ('c':)? We can write the two types down, then unify them:

fmap        :: Functor f => (a      -> b     ) -> f a      -> f b
     ('c':) ::               [Char] -> [Char]
fmap ('c':) :: Functor f => ([Char] -> [Char]) -> f [Char] -> f [Char]

Recalling that IO is a functor, if we want to define myGetContents :: IO [Char], it seems reasonable to use this:

myGetContents :: IO [Char]
myGetContents = do
  x <- getChar
  fmap (x:) myGetContents

This is close, but not quite equivalent to getContents, as this version will attempt to read past the end of the file and throw an error instead of returning a string. Just looking at it should make that clear: there is no way to return a concrete list, only an infinite cons chain. Knowing that the concrete case is "" at EOF (and using the infix syntax <$> for fmap) brings us to:

import System.IO
myGetContents :: IO [Char]
myGetContents = do
  reachedEOF <- isEOF
  if reachedEOF
  then return []
  else do
    x <- getChar
    (x:) <$> myGetContents

The Applicative class affords a (slight) simplification.

Recall that IO is an Applicative Functor, not just any old Functor. There are "Applicative Laws" associated with this typeclass much like the "Functor Laws", but we'll look specifically at <*>:

<*> :: Applicative f => f (a -> b) -> f a -> f b

This is almost identical to fmap (a.k.a. <$>), except that the function to apply is also wrapped. We can then avoid the bind in our else clause by using the Applicative style:

import System.IO
myGetContents :: IO String
myGetContents = do
  reachedEOF <- isEOF
  if reachedEOF
  then return []
  else (:) <$> getChar <*> myGetContents

One modification is necessary if the input may be infinite.

Remember when I said that you don't need the unsafe* functions if you just want to read all of stdin into a String? Well, if you just want some of the input, you do. If your input might be infinitely long, you definitely do. The final program differs in one import and a single word:

import System.IO
import System.IO.Unsafe
myGetContents :: IO [Char]
myGetContents = do
  reachedEOF <- isEOF
  if reachedEOF
  then return []
  else (:) <$> getChar <*> unsafeInterleaveIO myGetContents

The defining function of lazy IO is unsafeInterleaveIO (from System.IO.Unsafe). This delays the computation of the IO action until it is demanded.

Reimplementing getContents using getChar

3 Answers3

You can do this without any hacks.

The Applicative class affords a (slight) simplification.

One modification is necessary if the input may be infinite.