3

I have a recent activities table on mySQL, its MyISAM and contains around 46million records. It has been indexed properly and all that is done. But some queries can still take a few seconds to perform on it.

So I was wondering, what would be the best method to increase query performance on this database. Should I partition the table or delete older data? I mainly use some calculations on the data from this month for the queries, the other usage is showing users their recent activities, as well as us having to manually monitor their activities from time to time.

I have been thinking to delete all data older than this year, which should delete around 24 million rows. (as its been 4 months since the start of this month, so data that old should not matter as much).

Or I can partition according to month, but I dont know how that would implement in laravel, will I have to change partitions everytime to get data older than the current months?

Edit: This 1 query is an important one, it is performed after a specific activity from a user and this specific query took 8s to perform, and it went through 40,000 rows, using the user_id index. What would help me in improving this query, is either if it used multiple index, because then the query would have been narrowed down to very very few rows. Or if i remove old data, then it will have less rows to go through to finish the calculation.

select  *
    from  `recent_activites`
    where  `id` > 20443580
      and  `user_id` = 20580
      and  `updated_at` > '2020-04-01 00:00:00'
      and  `type` in (?, ?, ?, ?, ?, ?, ?, ?, ?) ```

Edit: Query : This query took 8.72 ms to perform at 1 time, but always seems to take more than 1 ms Time is there because it decreases rows

enter image description here

These are the indexes enter image description here

Sahaab Zahid
  • 65
  • 1
  • 12
  • 1
    You could also think about using a cache with laravel think of redis. – jogarcia Apr 12 '20 at 15:48
  • Ya i have a lot of cache being done too, but this data has to be updated rather often. The cache is working really well though, it has brought down our cpu usage from 50% to 20%, but I am updating it still every 3 hours for users. But for the first data refresh its still slow – Sahaab Zahid Apr 12 '20 at 16:07
  • 1
    Both methods you expose here would work, you could also do replication, but instead of that you should think if it is a good idea deleting data if it's no you have your answer. You could also rethink your database design, eviting things like joins. It depends of what you are trying to achieve here but you could even use more than one data base. Maybe you should be more specific – jogarcia Apr 12 '20 at 16:55
  • 1
    I've no direct experience with partitioning but a MySQL guru colleague of mine says partitioning is evil. I would go instead for replication or caching solutions. – lainatnavi Apr 12 '20 at 17:17
  • I have updated my OP to include a query example. I get around 300 queries daily like these which take longer than 4seconds (i am mainly monitoring queries which take longer than 4seconds for now) before cache this was above 5000. But sometimes cache is missed, or sometimes it refreshes, hence this occurs. And i do not really think deleting data is a good idea, it gives us less information about the user, so makes it harder to track down unfair usage or bots, etc. – Sahaab Zahid Apr 12 '20 at 17:21
  • 1
    Before you do general setup changes (except for moving away from MyISAM) or deleting data that you may still need, I would question your premise *"It has been indexed properly and all that is done."*. E.g. from your description and query performance, it sounds like you only have single column indexes, while your given query demands a composite index, `(user_id, updated_at)` should probably suffice. If you want help in that direction, you should always give the query, the table structure(s) (`show create table ... `) and the execution plan (write `explain ` infront of your query) and time. – Solarflare Apr 12 '20 at 19:26
  • Composite indexing could help, sorry I did not think of that. Could you help me out a bit more in that direction? How does it work, and whats the best way to do it here? Edit: and also, can updated_at be indexed here?? Im mainly using a '>' and a date, will date index work here? – Sahaab Zahid Apr 12 '20 at 21:26
  • Of course we can help you out with that. You should post this as a new question though (as your current question asks about what else to do). As mentioned, indexing questions should include your table structure (you can and should obfuscate or simplify it if it contains e.g. private/reveiling column names or maybe 30 columns that are not relevant for your query, or similar), execution plan, query, actual&expected time, maybe number of total and expected rows; something like [this question](https://stackoverflow.com/q/50974851). It's usually best to have one query per question. – Solarflare Apr 13 '20 at 00:05
  • Okay, thank you very much, I will do just that – Sahaab Zahid Apr 13 '20 at 00:21
  • Please elaborate on what you mean by "cache". What is the 3-hour task? Are you replacing all data? Or doing small changes? – Rick James Apr 20 '20 at 05:32
  • By cache, i meant I use redis to store the query outputs and then use redis to access those outputs when necessary. And this cache is only stored for 3 hours, and updated after that That was old, now I have switched to everytime the data is updated instead of 3 hours to refresh the cache. – Sahaab Zahid Sep 27 '20 at 07:11

2 Answers2

2
  • When removing that much of a table, it is faster to copy over the rows you want to keep.
  • You should move from MyISAM to InnoDB.
  • PARTITIONing would help with future purging of old data.
  • Adding PARTITIONing will cost time. So will any technique for deleting old data. Which is more important? Minimizing blockage? Or speed of getting through with the task.
  • PARTITIONing, per se, won't speed up queries. However, you seem to need a "2-D" index.
  • You must re-think the indexes when going to partitioning.
  • A Summary table might improve performance significantly.

Several of the above steps can be done simultaneously -- switch to InnoDB, add partitioning, remove old data, change indexes. (Summary tables is a separate task.) I recommend experimenting before tackling your live system.

References:

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • I switched to InnoDB, but now Im getting more slower queries for this table. Any ideas on how I could increase the speed? – Sahaab Zahid Sep 25 '20 at 09:22
  • Also halved the amount of rows, they were at 88 Million now, and brought them to 40 million now. Lastly, also added 2D and 3D indexes. Didnt partition the database yet. I dont think summary tables would be that helpful? Because I am trying to get the last 10 rows of the specific user_id mainly – Sahaab Zahid Sep 25 '20 at 09:30
  • 1
    @SahaabZahid - "last 10 rows of the specific user_id" should be quite fast. If not, let's see the query. – Rick James Sep 25 '20 at 21:27
  • This took 8.72ms select * from `recent_activites` where `user_id` = 326 and `updated_at` > '2020-08-26 16:59:25' order by `id` desc limit 10 I set updated_at time because due to the index, it decreased the overall rows looked through, it changes dynamically to 1 month ago I am going to add images of the explain and query and indexes to the main post – Sahaab Zahid Sep 26 '20 at 17:08
  • @SahaabZahid - (re Your Comment above) There are enough differences that you should start a fresh Question. Please include `SHOW CREATE TABLE`, `EXPLAIN SELECT`, etc. Leave out the "1 month ago" unless it really impacts the Question. Text, not images, please. – Rick James Sep 27 '20 at 16:41
0

You can move some data to the archive database, I mean store old data in separate database (archive) and actual data in you current database. In this case you will have still good performance for actual (fresh) data and possibility to access old data at the same time.

To handle this you need to setup 2 connections in your Laravel system - create new connection in config/database.php.

And now you should be able to access your archive data like this:

$archive_data = DB::connection('name_of_archive_connection')->table('table_name')->select('*')...
$actual_data = DB::connection('name_of_main_connection')->table('table_name')->select('*')...

Also you can easy move your data between databases just by insert/delete mysql functions like in example above.

Alex
  • 2,707
  • 4
  • 29
  • 42