0

I have a big table of messages with date and room columns. and 2 billion rows.

now i want keep only last 50 messages for every room and delete previous messages.

can i do it with a fast query ?

this question is unique , i didn't found any other question for delete rows over a grouped and ordered selection

  • Most likely the fasted way is making a new table and copy those records in, delete the old table and rename.. – Raymond Nijland Oct 18 '19 at 11:21
  • Anyhow questions about performance should atleast include a `SHOW CREATE TABLE table_name` so we know about what datatypes and possible indexes you are talking about.. – Raymond Nijland Oct 18 '19 at 11:22
  • Possible duplicate of [SQL query: Delete all records from the table except latest N?](https://stackoverflow.com/questions/578867/sql-query-delete-all-records-from-the-table-except-latest-n) – Tiago Martins Peres Oct 18 '19 at 11:24
  • It's very often quicker to contruct a new table, retaining only those records you wish to keep. Then DROP the old table. Then rename and re-index the new table. – Strawberry Oct 18 '19 at 12:21

2 Answers2

0

You cannot do it in a fast query. You have a lot of data.

I would suggest creating a new table. You can then replace the data in your first table, if necessary.

Possibly the most efficient method to get the 50 rows -- assuming that date is unique for each room:

select t.*
from t
where t.date >= coalesce((select t2.date
                          from t t2
                          where t2.room = t.room
                          order by t2.date desc
                          limit 1
                         ), t.date
                        );

For this to have any hope of performance you want an index on (room, date).

You can also try row_number() in MySQL 8+:

select . . .   -- list the columns
from (select t.*, row_number() over (partition by room order by date desc) as seqnum
      from t
     ) t
where seqnum <= 50;

Then you can replace the data by doing:

create table temp_t as 
    select . . .   -- one of the select queries here;

truncate table t;  -- this gets rid of all the data, so be careful

insert into t
   select *
   from temp_t;

Massive inserts are much more efficient than massive updates, because the old data does not need to be logged (nor the pages locked and other things).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

You can use Rank() function to get top 50 results for each group ordered by date desc, so the last entries will be in top.
http://www.mysqltutorial.org/mysql-window-functions/mysql-rank-function/
Then you left join that subquery on your table on id ( or room and date, if those are unique and you don’t have id in your table) The last step would be to filter all such result that have null in subquery and delete those.
The full code will look something like this:

DELETE T FROM YOURTABLE T
LEFT JOIN (
    SELECT  *,
        RANK() OVER (PARTITION BY
                     ROOM
                 ORDER BY
                     [DATE] DESC
                ) DATE_RANK
                ) AS T2
ON T.[DATE] = T2.[DATE]
AND T.ROOM = T2.ROOM
AND T2.DATE_RANK<=50
WHERE T2.DATE IS NULL