It's not so much that lazy evaluation led to purity; Haskell was pure to begin with. Rather, lazy evaluation forced the designers of the language to keep the language pure.
Here is a relevant passage from the article A History of Haskell: Being Lazy with Class:
Once we were committed to a lazy language, a pure one was inescapable. The converse is not true, but it is notable that in practice most pure programming languages are also lazy. Why? Because in a call-by-value language, whether functional or not, the temptation to allow unrestricted side effects inside a “function” is almost irresistible.
Purity is a big bet, with pervasive consequences. Unrestricted side effects are undoubtedly very convenient. Lacking side effects, Haskell’s input/output was initially painfully clumsy, which was a source of considerable embarrassment. Necessity being the mother of invention, this embarrassment ultimately led to the invention of monadic I/O, which we now regard as one of Haskell’s main con- tributions to the world, as we discuss in more detail in Section 7.
Whether a pure language (with monadic effects) is ultimately the best way to write programs is still an open question, but it certainly is a radical and elegant attack on the challenge of programming, and it was that combination of power and beauty that motivated the designers. In retrospect, therefore, perhaps the biggest single benefit of laziness is not laziness per se, but rather that laziness kept us pure, and thereby motivated a great deal of productive work on monads and encapsulated state.
(my emphasis)
I also invite you to listen to 18'30'' of Software Engineering Radio podcast #108 for an explanation by the man himself. And here is a longer but relevant passage from SPJ's interview in Peter Seibel's Coders at Work:
I now think the important thing about laziness is that it kept us pure. [...]
[...] if you have a lazy evaluator, it’s harder to predict exactly when an expression is going to be evaluated. So that means if you want to print something on the screen, every call-by-value language, where the order of evaluation is completely explicit, does that by having an impure “function”—I’m putting quotes around it because it now isn’t a function at all—with type something like string to unit. You call this function and as a side effect it puts something on the screen. That’s what happens in Lisp; it also happens in ML. It happens in essentially every call-by-value language.
Now in a pure language, if you have a function from string to unit you would never need to call it because you know that it just gives the answer unit. That’s all a function can do, is give you the answer. And you know what the answer is. But of course if it has side effects, it’s very important that you do call it. In a lazy language the trouble is if you say, “f
applied to print "hello"
,” then whether f
evaluates its first argument is not apparent to the caller of the function. It’s something to do with the innards of the function. And if you pass it two arguments, f
of print "hello"
and print "goodbye"
, then you might print either or both in either order or neither. So somehow, with lazy evaluation, doing input/output by side effect just isn’t feasible. You can’t write sensible, reliable, predictable programs that way. So, we had to put up with that. It was a bit embarrassing really because you couldn’t really do any input/output to speak of. So for a long time we essentially had programs which could just take a string to a string. That was what the whole program did. The input string was the input and result string was the output and that’s all the program could really ever do.
You could get a bit clever by making the output string encode some output commands that were interpreted by some outer interpreter. So the output string might say, “Print this on the screen; put that on the disk.” An interpreter could actually do that. So you imagine the functional program is all nice and pure and there’s sort of this evil interpreter that interprets a string of commands. But then, of course, if you read a file, how do you get the input back into the program? Well, that’s not a problem, because you can output a string of commands that are interpreted by the evil interpreter and using lazy evaluation, it can dump the results back into the input of the program. So the program now takes a stream of responses to a stream of requests. The stream of requests go to the evil interpreter that does the things to the world. Each request generates a response that’s then fed back to the input. And because evaluation is lazy, the program has emitted a response just in time for it to come round the loop and be consumed as an input. But it was a bit fragile because if you consumed your response a bit too eagerly, then you get some kind of deadlock. Because you’d be asking for the answer to a question you hadn’t yet spat out of your back end yet.
The point of this is laziness drove us into a corner in which we had to think of ways around this I/O problem. I think that that was extremely important. The single most important thing about laziness was it drove us there.
(my emphasis)