47

Most Haskell tutorials teach the use of do-notation for IO.

I also started with the do-notation, but that makes my code look more like an imperative language more than a FP language.

This week I saw a tutorial use IO with <$>

stringAnalyzer <$> readFile "testfile.txt"

instead of using do

main = do
    strFile <- readFile "testfile.txt"
    let analysisResult = stringAnalyzer strFile
    return analysisResult

And the log analysis tool is finished without the do.

So my question is "Should we avoid do-notation in any case?".

I know maybe do will make the code better in some cases.

Also, why do most tutorials teach IO with do?

In my opinion <$> and <*> makes the code more FP than IO.

Will Ness
  • 70,110
  • 9
  • 98
  • 181
Julian.zhang
  • 709
  • 1
  • 7
  • 11
  • Q2: Must tutorials teach IO with `do` because (a) applicative is newer and less well known, (b) do notation is closer to imperative code, which many students learn first, and (c) there are some things you can do with a Monad that you can't do with an applicative, chiefly changing what code is called based on the value previously output. – AndrewC May 24 '13 at 12:29
  • Q1: Applicative is lovely, clean, and very functional in style. Master it and look out for as many opportunities as you can to clean up your code with a bit of `*>` etc, but you will need to use the full power of monads sometimes and when you do, the `do` notation is usually the clearest. – AndrewC May 24 '13 at 12:34
  • Only use it when it makes things easier. For example, never use do notation for a single line. (Like, never do `main = do print "Hello World"`. Use `main = print "Hello World"` instead.) Make sure you understand monads in terms of `>>=` and `return`. When you understand this, you may use `do`. – PyRulez Nov 08 '15 at 01:29
  • 1
    do notation is the natural language for working in the kleisli category – Poscat Jun 01 '20 at 04:56

7 Answers7

48

do notation in Haskell desugars in a pretty simple way.

do
  x <- foo
  e1 
  e2
  ...

turns into

 foo >>= \x ->
 do
   e1
   e2

and

do
  x
  e1
  e2
  ...

into

x >>
do 
  e1
  e2
  ....

This means you can really write any monadic computation with >>= and return. The only reason why we don't is because it's just more painful syntax. Monads are useful for imitating imperative code, do notation makes it look like it.

The C-ish syntax makes it far easier for beginners to understand it. You're right it doesn't look as functional, but requiring someone to grok monads properly before they can use IO is a pretty big deterrent.

The reason why we'd use >>= and return on the other hand is because it's much more compact for 1 - 2 liners. However it does tend to get a bit more unreadable for anything too big. So to directly answer your question, No please don't avoid do notation when appropriate.

Lastly the two operators you saw, <$> and <*>, are actually fmap and applicative respectively, not monadic. They can't actually be used to represent a lot of what do notation does. They're more compact to be sure, but they don't let you easily name intermediate values. Personally, I use them about 80% of the time, mostly because I tend to write very small composable functions anyways which applicatives are great for.

daniel gratzer
  • 52,833
  • 11
  • 94
  • 134
  • 3
    The desugaring you describe is indeed simple, but naive. Do-notation also involves error handling related to the Monad fail function, but you mention nothing about it. Still, this answer has been very helpful and I've referenced it many times. +1 from me. – Buttons840 Mar 28 '14 at 20:18
  • This was helpful to me. However, what's the base case? – orm Apr 18 '16 at 17:37
47

In my opinion <$> and <*> makes the code more FP than IO.

Haskell is not a purely functional language because that "looks better". Sometimes it does, often it doesn't. The reason for staying functional is not its syntax but its semantics. It equips us with referential transparency, which makes it far easier to prove invariants, allows very high-level optimisations, makes it easy to write general-purpose code etc..

None of this has much to do with syntax. Monadic computations are still purely functional – regardless of whether you write them with do notation or with <$>, <*> and >>=, so we get Haskell's benefits either way.

However, notwithstanding the aforementioned FP-benefits, it is often more intuitive to think about algorithms from an imperative-like point of view – even if you're accustomed to how this is implemented through monads. In these cases, do notation gives you this quick insight of "order of computation", "origin of data", "point of modification", yet it's trivial to manually desugar it in your head to the >>= version, to grasp what's going on functionally.

Applicative style is certainly great in many ways, however it is inherently point-free. That is often a good thing, but especially in more complex problems it can be very helpful to give names to "temporary" variables. When using only "FP" Haskell syntax, this requires either lambdas or explicitly named functions. Both have good use cases, but the former introduces quite a bit of noise right in the middle of your code and the latter rather disrupts the "flow" since it requires a where or let placed somewhere else from where you use it. do, on the other hand, allows you to introduce a named variable right where you need it, without introducing any noise at all.

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
leftaroundabout
  • 117,950
  • 5
  • 174
  • 319
  • 18
    I endorse this answer. Avoiding `do`-notation just because it "looks imperative" is counter-productive (it often makes the code far less readable). Use the right tool syntax for the job etc. – Daniel Fischer May 24 '13 at 11:58
42

I often find myself first writing a monadic action in do notation, then refactoring it down to a simple monadic (or functorial) expression. This happens mostly when the do block turns out to be shorter than I expected. Sometimes I refactor in the opposite direction; it depends on the code in question.

My general rule is: if the do block is only a couple of lines long it's usually neater as a short expression. A long do-block is probably more readable as it is, unless you can find a way to break it up into smaller, more composable functions.


As a worked example, here's how we might transform your verbose code snippet into your simple one.

main = do
    strFile <- readFile "testfile.txt"
    let analysisResult = stringAnalyzer strFile
    return analysisResult

Firstly, notice that the last two lines have the form let x = y in return x. This can of course be transformed into simply return y.

main = do
    strFile <- readFile "testfile.txt"
    return (stringAnalyzer strFile)

This is a very short do block: we bind readFile "testfile.txt" to a name, and then do something to that name in the very next line. Let's try 'de-sugaring' it like the compiler will:

main = readFile "testFile.txt" >>= \strFile -> return (stringAnalyser strFile)

Look at the lambda-form on the right hand side of >>=. It's begging to be rewritten in point-free style: \x -> f $ g x becomes \x -> (f . g) x which becomes f . g.

main = readFile "testFile.txt" >>= (return . stringAnalyser)

This is already a lot neater than the original do block, but we can go further.

Here's the only step that requires a little thought (though once you're familiar with monads and functors it should be obvious). The above function is suggestive of one of the monad laws: (m >>= return) == m. The only difference is that the function on the right hand side of >>= isn't just return - we do something to the object inside the monad before wrapping it back up in a return. But the pattern of 'doing something to a wrapped value without affecting its wrapper' is exactly what Functor is for. All monads are functors, so we can refactor this so that we don't even need the Monad instance:

main = fmap stringAnalyser (readFile "testFile.txt")

Finally, note that <$> is just another way of writing fmap.

main = stringAnalyser <$> readFile "testFile.txt"

I think this version is a lot clearer than the original code. It can be read like a sentence: "main is stringAnalyser applied to the result of reading "testFile.txt"". The original version bogs you down in the procedural details of its operation.


Addendum: my comment that 'all monads are functors' can in fact be justified by the observation that m >>= (return . f) (aka the standard library's liftM) is the same as fmap f m. If you have an instance of Monad, you get an instance of Functor 'for free' - just define fmap = liftM! If someone's defined a Monad instance for their type but not instances for Functor and Applicative, I'd call that a bug. Clients expect to be able to use Functor methods on instances of Monad without too much hassle.

Benjamin Hodgson
  • 42,952
  • 15
  • 108
  • 157
  • 5
    Thankfully, base 4.8 makes `Applicative` (and therefore `Functor`) a superclass of `Monad`, the big breaking change that no one ever complains about. So soon, there will be no need to even *think* about whether your `Monad` is a `Functor`! – dfeuer Jan 31 '15 at 16:47
10

Applicative style should be encouraged because it composes (and it is prettier). Monadic style is necessary in certain cases. See https://stackoverflow.com/a/7042674/1019205 for an in depth explanation.

Community
  • 1
  • 1
cheecheeo
  • 651
  • 7
  • 18
10

Should we avoid do-notation in any case?

I'd say definitely no. For me, the most important criterion in such cases is to make the code as much readable and understandable as possible. The do-notation was introduced to make monadic code more understandable, and this is what matters. Sure, in many cases, using Applicative point-free notation is very nice, for example, instead of

do
    f <- [(+1), (*7)]
    i <- [1..5]
    return $ f i

we'd write just [(+1), (*7)] <*> [1..5].

But there are many examples where not using the do-notation will make code very unreadable. Consider this example:

nameDo :: IO ()
nameDo = do putStr "What is your first name? "
            first <- getLine
            putStr "And your last name? "
            last <- getLine
            let full = first++" "++last
            putStrLn ("Pleased to meet you, "++full++"!")

here it's quite clear what's happening and how the IO actions are sequenced. A do-free notation looks like

name :: IO ()
name = putStr "What is your first name? " >>
       getLine >>= f
       where
       f first = putStr "And your last name? " >>
                 getLine >>= g
                 where
                 g last = putStrLn ("Pleased to meet you, "++full++"!")
                          where
                          full = first++" "++last

or like

nameLambda :: IO ()
nameLambda = putStr "What is your first name? " >>
             getLine >>=
             \first -> putStr "And your last name? " >>
             getLine >>=
             \last -> let full = first++" "++last
                          in  putStrLn ("Pleased to meet you, "++full++"!")

which are both much less readable. Certainly, here the do-notation is much more preferable here.

If you want to avoid using do, try structuring your code into many small functions. This is a good habit anyway, and you can reduce your do block to contain only 2-3 lines, which can be then replaced nicely by >>=, <$>,<*>` etc. For example, the above could be rewritten as

name = getName >>= welcome
  where
    ask :: String -> IO String
    ask s = putStr s >> getLine

    join :: [String] -> String
    join  = concat . intersperse " "

    getName :: IO String
    getName  = join <$> traverse ask ["What is your first name? ",
                                      "And your last name? "]

    welcome :: String -> IO ()
    welcome full = putStrLn ("Pleased to meet you, "++full++"!")

It's a bit longer, and maybe a bit less understandable to Haskell beginners (due to intersperse, concat and traverse), but in many cases those new, small functions can be reused in other places of your code, which will make it more structured and composable.


I'd say the situation is very similar to whether to use the point-free notation or not. In many many cases (like in the top-most example [(+1), (*7)] <*> [1..5]) the point-free notation is great, but if you try to convert a complicated expression, you will get results like

f = ((ite . (<= 1)) `flip` 1) <*>
     (((+) . (f . (subtract 1))) <*> (f . (subtract 2)))
  where
    ite e x y = if e then x else y

It'd take me quite a long time to understand it without running the code. [Spoiler below:]

f x = if (x <= 1) then 1 else f (x-1) + f (x-2)


Also, why do most tutorials teach IO with do?

Because IO is exactly designed to mimic imperative computations with side-effects, and so sequencing them using do is very natural.

Petr
  • 62,528
  • 13
  • 153
  • 317
  • Your point-free `f` would still be hard to penetrate if it used the more readable `(<= 1)` and `subtract 1` (resp. 2) instead of ``(<=) `flip` 1`` and ``(-) `flip` 1`` (honestly, anybody who uses that must be a tool, hopefully a command-line tool). Well-reasoned answer. – Daniel Fischer May 25 '13 at 22:03
  • @DanielFischer Thanks for the suggestions, I updated the example. (Yes, it was generated by a command line tool for abstraction elimination.) – Petr May 26 '13 at 06:33
6

do notation is just a syntactic sugar. It can be avoided in all cases. However, in some cases replacing do with >>= and return makes code less readable.

So, for your questions:

"Shall we avoid the Do statement in any case?".

Focus on making your code clear and better readable. Use do when it helps, avoid it otherwise.

And I had another question, why most tutorials will teach IO with do?

Because do makes IO code better readable in many cases.

Also, the majority of people who starts learning Haskell have imperative programming experience. Tutorials are for beginners. They should use style that is easy to understand by newcomers.

Sergey Bolgov
  • 806
  • 5
  • 6
3

The do notation is expanded to an expression using the functions (>>=) and (>>), and the let expression. So it is not part of the core of the language.

(>>=) and (>>) are used to combine actions sequentially and they are essential when the result of an action changes the structure of the following actions.

In the example given in the question this is not apparent as there is only one IO action, therefore no sequencing is needed.

Consider for example the expression

do x <- getLine
   print (length x)
   y <- getLine
   return (x ++ y)

which is translated to

getLine >>= \x ->
print (length x) >>
getLine >>= \y ->
return (x ++ y)

In this example the do notation (or the (>>=) and (>>) functions) is needed for sequencing the IO actions.

So soon or later the programmer will need it.

Romildo
  • 545
  • 9
  • 20
  • 2
    do-notation is *not* needed. It is just sytactic sugar. Your example is wrong --- bind operator of IO monad takes care of sequencial execution. – KAction May 24 '13 at 10:42
  • Read more carefully what I have written: without the `do` notation OR the functions `(>>=)` and `(>>)` it may be expanded to ... – Romildo May 24 '13 at 14:55
  • It is clear that the `do` notation OR the functions `(>>=)` and `(>>)` are needed for sequencing `IO` actions, and that the `do` notation is expanded (that is, it is syntactic sugar) to applications of those functions. – Romildo May 24 '13 at 14:57
  • @hammar in my example `action1`, `action2` and `action3` stands for general actions that may need the result of previous actions. A more realistic example is `do { x <- getLine; print (length x); y <- getLine; return (x++y) }` where `action1`, `action2`, `action3` and `f` stand for `getLine`, `print (length x)`, `getLine`, and `(++)`, respectively. – Romildo May 24 '13 at 20:07
  • I have edited my answer to better explain the idea, as suggested by @hammar. – Romildo May 24 '13 at 20:27