0

Friends, there is a need to optimize the preservation of chat messages in the database MySql.Database type is not important.There chat working code that stores messages in database tables on the principle of a single message is one line.If users will be 1000 -10 000 and maybe more ,it is expected,then imagine how much will rows in the table.One option is to bring all posts into cache (file, APC, even in principle, can be stored in Redis),and every 24 hours from cron once poured into the main base in the form of json string.So get that one day this one record.

It is necessary to optimize the approach. Is it true I think? Is it possible to improve this option, or there is better ? Thank you.

Vanya Avchyan
  • 880
  • 7
  • 24
  • 1
    @C0dekid This is a very serious problem.Because the database may break because of the wrong approach – Vanya Avchyan Jul 24 '16 at 13:42
  • 1
    "The table can be filled very fast" — A MySQL table can hold one billion seventy-three million seven hundred forty-one thousand eight hundred twenty-four rows. How many messages are you getting per day that will fill that up quickly? – Quentin Jul 24 '16 at 13:45
  • @Quentin How long does it take each time to pull out those messages? – Vanya Avchyan Jul 24 '16 at 13:48
  • 1
    What does that have to do with filling up the table? Do you have any particular reason to be worried about performance problems? – Quentin Jul 24 '16 at 13:51
  • You seem to be trying to fix a performance problem before there is any sign that you have one or might have one. Don't waste your time on that. http://c2.com/cgi/wiki?PrematureOptimization – Quentin Jul 24 '16 at 13:53
  • if you index your table then the getting the message should be fast. However if you want fast update and retrieve time you should explore NoSQL solutions like [Cassandra](http://cassandra.apache.org/) or [MongoDB.](https://www.mongodb.com/). You should also show some benchmark code, as performance is in the eye of the beholder. – s952163 Jul 24 '16 at 13:55
  • If you need to split tables than you should look into [Sharding](http://stackoverflow.com/questions/5541421/mysql-sharding-approaches/) – s952163 Jul 24 '16 at 13:57
  • 1
    @Quentin Each day the number of users exceeds 1000.In my opinion, proper optimization does not hurt – Vanya Avchyan Jul 24 '16 at 13:58
  • @Quentin My favorite [Performance Rant](https://ericlippert.com/2012/12/17/performance-rant/). That said sometimes you know ahead of time that you will have a bottleneck somewhere. It makes plan to work on it at the design stage. – s952163 Jul 24 '16 at 13:59
  • Here, in principle, the type of database does not play a big role. – Vanya Avchyan Jul 24 '16 at 14:01
  • @VanyaAvchyan — If each of those 1000 users sends 100 messages a day (which is a lot for most use cases) then it will take **30 years** to fill the table. Databases are designed to be able to handle huge amounts of data. Don't rush in to protect them from it. – Quentin Jul 24 '16 at 14:01
  • @Quentin 1000 users can send more than a million messages a day.Where do you get these numbers ? – Vanya Avchyan Jul 24 '16 at 14:04
  • @VanyaAvchyan — My number came from thinking about how many messages I tend to send on a messaging system per day. Where id your number come from? What sort of user can write 11.5 messages per second? Wait, you've edited your comment to work on a different scale, now my calculation is wrong. – Quentin Jul 24 '16 at 14:06
  • @Quentin You have not understood.Maybe 1000 users , and then more and more – Vanya Avchyan Jul 24 '16 at 14:09
  • Your users are generating an average of a message every minute for 16 hours of the data. That still seems like a lot. How did you come by these numbers? You're proposing some pretty extreme measures for archiving data (which will make accessing archived data *very* slow to access) to deal with this problem. Have you actually seen signs that you have this problem yet? – Quentin Jul 24 '16 at 14:11
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/118137/discussion-between-vanya-avchyan-and-quentin). – Vanya Avchyan Jul 24 '16 at 14:13
  • @Quentin What do you think about in order to advise your option for developers stackoverflow ? Probably they will be delighted :) – Vanya Avchyan Jul 24 '16 at 14:20
  • @Quentin -- there is no limit on number of rows, certainly not 1 billion. – Rick James Jul 24 '16 at 21:53
  • If you are going to insert more than 100 rows per _second_, then you _may_ have a performance issue. If you need to store a lot more data than the size of RAM, then we _may_ need to discuss the queries. – Rick James Jul 24 '16 at 21:55

1 Answers1

2

The question is rather broad - and arguably opinion-based.

However...

You have to balance all sorts of concerns in this design. Performance and scalability at one end, effort to build and maintain, and infrastructure at the other end. In most cases, the effort and cost aspect are important.

So, my recommendation is to start with a relational model, with no caching, sharding, etc., but with a powerful database server, a clean relational schema, and lots of attention to query optimization. In my experience, this makes that application fast enough for tens of thousands of concurrent users, can store tens or hundreds of millions of rows without performance impact, and is the most cost effective both for building and maintaining. Hardware is usually much cheaper than developer time.

I also recommend setting up a performance and scalability test system, with representative data, and a load test framework (something like Apache JMeter). Use this system to verify how your system performs under load, and with large amounts of data. Set performance and scalability goals, e.g. "10K concurrent users, 1 million old messages, response time must be < 1 second".

Run regular tests on your load testing environment, optimize and tune the schema, queries etc., and keep doing that until you really have nowhere else to go.

My guess is it will take you a huge amount of traffic to reach that point.

Once you get to that point, caching is probably the next step. This is non-trivial, especially in chat applications. You have to make sure that the cache pays for itself (i.e. that the cache hit ratio is high enough for it to make a difference; typically, that means at least 10%), and that you don't spend so much time managing the cache (invalidating when you post a new message, for instance) that it does more harm than good.

So, prove you have a real, measurable need, and then optimize.

Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52