Monads are a unification of several things in computing that were previously treated as being separate. When I say "unification" I mean it in the same sense that a physicist would: just as Newtonian gravity unified planetary orbits with terrestrial ballistics, and Maxwell's equations unified electrical and magnetic fields, so monads unify the following:
- Seqencing (do A, then B, then C).
- Exception handling (Do A. If it fails then do B).
- Parallel processing (do A and B in parallel, then combine their results).
- Non-determinism (Do A. For each possible result, do B. For each possible result, do C).
- State machines.
- Logic programming, a la Prolog.
And probably some more that I've forgotten about. Once you understand the general theory of monads you realise that it applies to lots of things, just like Maxewll's equations opened up the whole electromagnetic spectrum instead of us thinking that light and radio were two separate phenomena and hence not realising there was some really interesting stuff in between them.
One side effect of this generality is that you can write your own control-flow constructs. Haskell versions of "while" and "for" are not built in, they are part of the standard library. If you want something different then you can write it.
The core concepts in monads are "bind" (i.e. the (>>=) operator) and "return" (which is a really dumb name, but we are stuck with it). There are some basic laws which these have to obey. Roughly speaking, the two operations aren't allowed to do anything underhand. So "return" isn't allowed to have any side effects, and rearranging the brackets in a daisy chain of "bind" operations isn't allowed to change the meaning of the program.
This is really abstract stuff, which is why it looks weird to programmers raised on conventional imperative languages. Only when you have understood half a dozen different monads do you appreciate the fundamental unity of the concept. People learning Haskell first encounter the IO monad because you need it for "Hello World". At that point it just looks like a strange way of doing what every other programming does. This is because the IO monad is a model of the way that every other programming language (except Prolog) works; a sequence of actions that have effects on the real world, including named locations in physical memory.
The big advantage of Haskell monads is that, unlike every other programming language, it is not trapped in the IO monad. You can sort-of define monads in other languages, but they still get executed in what Haskell would see as the IO monad. Because Haskell is pure it lets you escape the constraints of IO and define new monads as true abstractions.