0

I have a MySQL database that is becoming really large. I can feel the site becoming slower because of this.

Now, on a lot of pages I only need a certain part of the data. For example, I store information about users every 5 minutes for history purposes. But on one page I only need the information that is the newest (not the whole history of data). I achieve this by a simple MAX(date) in my query.

Now I'm wondering if it wouldn't be better to make a separate table that just stores the latest data so that the query doesn't have to search for the latest data from a specific user between millions of rows but instead just has a table with only the latest data from every user.

  • The con here would be that I have to run 2 queries to insert the latest history in my database every 5 minutes, i.e. insert the new data in the history table and update the data in the latest history table.

  • The pro would be that MySQL has a lot less data to go through.

What are common ways to handle this kind of issue?

Andrew Morton
  • 24,203
  • 9
  • 60
  • 84
Merijndk
  • 1,674
  • 3
  • 18
  • 35
  • 3
    It would probably help if you showed your current database structure, including indexes; and the query with the "simple MAX(date)". – Uueerdo May 31 '17 at 15:48
  • 2
    typically this problem is solved with indexes -- might want to try that before such crazy architectural choices – Hogan May 31 '17 at 16:29

1 Answers1

4

There are a number of ways to handle slow queries in large tables. The three most basic ways are:


1: Use indexes, and use them correctly. It is important to avoid table scans on large tables; this is almost always your most significant performance hit with single queries.

For example, if you're querying something like: select max(active_date) from activity where user_id=?, then create an index on the activity table for the user_id column. You can have multiple columns in an index, and multiple indexes on a table.

CREATE INDEX idx_user ON activity (user_id)

2: Use summary/"cache" tables. This is what you have suggested. In your case, you could apply an insert trigger to your activity table, which will update the your summary table whenever a new row gets inserted. This will mean that you won't need your code to execute two queries. For example:

CREATE TRIGGER update_summary 
AFTER INSERT ON activity 
FOR EACH ROW
UPDATE activity_summary SET last_active_date=new.active_date WHERE user_id=new.user_id

You can change that to check if a row exists for the user already and do an insert if it is their first activity. Or you can insert a row into the summary table when a user registers...Or whatever.


3: Review the query! Use MySQL's EXPLAIN command to grab a query plan to see what the optimizer does with your query. Use it to ensure that the optimizer is avoiding table scans on large tables (and either create or force an index if necesary).

RToyo
  • 2,877
  • 1
  • 15
  • 22
  • 1
    Thanks a lot. that really helps. Now imported the database in workbench and going to try some things out! – Merijndk Jun 01 '17 at 13:13
  • 1
    Thanks Hogan. @Merijndk In your particular case, I would suggest using a summary table. I have no doubt that adding an index to your table will get your queries returned instantly, but at an enterprise scale, DBAs will appreciate you keeping your queries away from large and active tables that could potentially get locked by your query. If you're logging an individual user every 5 minutes, and you have 10,000 users, that's a lot of actions (3-4 inserts per second) on the main table already. – RToyo Jun 01 '17 at 21:15
  • 2
    @RobbieToyota adding good indexed and using forgein keys indeed helped me a lot (from 8.9 seconds queries to 0.6) – Merijndk Jun 01 '17 at 21:49
  • 3
    More on [_summary tables_](https://mariadb.com/kb/en/mariadb/data-warehousing-summary-tables/) – Rick James Jun 02 '17 at 03:16