Actually, this question has two answers:
First, as Will Bickford mentioned, is that you have to avoid premature optimization. You have to have a working application to determine places that may benefit from performance improvements. Only after that you can try to apply caching.
Second is designing your application for future load. This means that before you start coding, you should try to answer this question: "How will my design change if the application must handle load, let's say, 10,000 times more than it is planned now?" This usually leads to a more advanced (as in though-through) design that accommodates scaling to multiple servers, and with that comes the need to share state between the servers, hence the distributed caching. If you don't answer this question, there is a good chance you'll have to re-write your application sooner or later.
In other words, design for future, implement for today.
Hope this helps.
Slava Imeshev