The question is rather broad - and arguably opinion-based.
However...
You have to balance all sorts of concerns in this design. Performance and scalability at one end, effort to build and maintain, and infrastructure at the other end. In most cases, the effort and cost aspect are important.
So, my recommendation is to start with a relational model, with no caching, sharding, etc., but with a powerful database server, a clean relational schema, and lots of attention to query optimization. In my experience, this makes that application fast enough for tens of thousands of concurrent users, can store tens or hundreds of millions of rows without performance impact, and is the most cost effective both for building and maintaining. Hardware is usually much cheaper than developer time.
I also recommend setting up a performance and scalability test system, with representative data, and a load test framework (something like Apache JMeter). Use this system to verify how your system performs under load, and with large amounts of data. Set performance and scalability goals, e.g. "10K concurrent users, 1 million old messages, response time must be < 1 second".
Run regular tests on your load testing environment, optimize and tune the schema, queries etc., and keep doing that until you really have nowhere else to go.
My guess is it will take you a huge amount of traffic to reach that point.
Once you get to that point, caching is probably the next step. This is non-trivial, especially in chat applications. You have to make sure that the cache pays for itself (i.e. that the cache hit ratio is high enough for it to make a difference; typically, that means at least 10%), and that you don't spend so much time managing the cache (invalidating when you post a new message, for instance) that it does more harm than good.
So, prove you have a real, measurable need, and then optimize.