I have read a blog post that mentions Haskell optmization. It says:
if Haskell sees multiple copies of this expensive function called on the exact same input, it will simply evaluate one of the function calls, cache the result, and replace every future function call of F on that same input with the cached result.
It uses this as an example how Haskell is better then Python (for calling same heavy function multiple times.
And from reading it I'm asking myself. How can this be true for a function that does any kind of IO?