5

Ok, I'm trying to wrap my head around IO in Haskell, and I figured I'd write a short little app dealing with web pages to do it. The snippet I'm getting tripped up at is (with apologies to bobince, though to be fair, I'm not trying to parse HTML here, just extract one or two values):

titleFromUrl url = do
    (_, page) <- curlGetString url [CurlTimeout 60]   
    matchRegex (mkRegexWithOpts "<title>(.*?)</title>" False True) page

The above should take a URL in string form, scan the page it points to with matchRegex, and return either Nothing or Just [a], where a is the matched (possibly multi-line) string. The frustrating thing is that when I try doing

Prelude> (_, page) <- curlGetString url [CurlTimeout 60]
Prelude> matchRegex (mkRegexWithOpts "<title>(.*?)</title>" False True) page

in the interpreter, it does precisely what I want it to. When I try to load the same expression, and associated imports from a file, it gives me a type inference error stating that it couldn't match expected type 'IO b' against inferred type 'Maybe [String]'. This tells me I'm missing something small and fundamental, but I can't figure out what. I've tried explicitly casting page to a string, but that's just programming by superstition (and it didn't work in any case).

Any hints?

Community
  • 1
  • 1
Inaimathi
  • 13,853
  • 9
  • 49
  • 93

1 Answers1

8

Yeah, GHCi accepts any sort of value. You can say:

ghci> 4
4
ghci> print 4
4

But those two values (4 and print 4) are clearly not equal. The magic GHC is doing is that if what you typed evaluates to an IO something then it executes that action (and prints the result if something is not ()). If it doesn't, then it calls show on the value and prints that. Anyway, this magic is not accessible from your program.

When you say:

do foo <- bar :: IO Int
   baz

baz is expected to be of type IO something, and it's a type error otherwise. That would let you execute I/O and then return a pure value. You can check that with noting that desugaring the above yields:

bar >>= (\foo -> baz)

And

-- (specializing to IO for simplicity)
(>>=) :: IO a -> (a -> IO b) -> IO b

Therefore

bar :: IO a
foo :: a
baz :: IO b

The way to fix it is to turn your return value into an IO value using the return function:

return :: a -> IO a  -- (again specialized to IO)

Your code is then:

titleFromUrl url = do
    (_, page) <- curlGetString url [CurlTimeout 60]   
    return $ matchRegex (mkRegexWithOpts "<title>(.*?)</title>" False True) page

For most of the discussion above, you can substitute any monad for IO (eg. Maybe, [], ...) and it will still be true.

luqui
  • 59,485
  • 12
  • 145
  • 204
  • It works, but just as a follow up; am I understanding correctly that this basically means I can't return a regular string from a function that performs IO? that it rather needs to be `IO String` (or, as in the above case `IO (Maybe [String])`? What if I want to do something like concatenate the return value of `titleFromUrl` with another string, or print it out without the `Just [~a]` wrapping it? Sorry if this is a stupid question, I'm kind of new to the strong typing thing. – Inaimathi Nov 16 '10 at 04:38
  • That's fine, you just need to bind. If you have a value `m` of type `IO a`, then you can write `do { x <- m; stuff }`, and `x` will have type `a`, to which you can do anything you want. The only restriction is that `stuff` has to be some sort of `IO` value, which can be a value or function call, or it can be more `<-` bindings. So, you can do anything with the `String` inside, as long as you eventually return an `IO` something. I suggest reading a monad tutorial. There are tons, here are two: http://blog.sigfpe.com/2006/08/you-could-have-invented-monads-and.html or LYAH chapter 11 and 12. – luqui Nov 16 '10 at 04:50
  • [facepalm] Ok, I **think** that link helped. I was going wrong in forgetting that you can't guarantee the order of execution in a lazy, purely functional language. Your modification to the snippet just tells the compiler to force the result of `matchRegex` before using it anywhere. Am I close? – Inaimathi Nov 16 '10 at 12:36
  • @Inaimathi, Nope. `IO a` is the type of *data structures* which represent computations which possibly do some I/O and then result in an `a`. `return x` is the data structure representing the computation which does no I/O and just returns `x`. But yeah, the comments of a question about Curl isn't exactly the best place to describe monadic computation. There are plenty of tutorials around, use them. – luqui Nov 16 '10 at 13:46
  • Hmm, it looks like I've got some pretty heavy reading to do. Thanks for your help. – Inaimathi Nov 16 '10 at 15:43