1

So a table which has 2.5 TB volume and this is solely because of some lame unused data accumulated over a period of time, I want to clean up this data as we are planning to migrate the hosts and would not like to take this burden on new hosts, the caveat is that whether the data is useful or not is identified using an association with other table something like this:

select id from lame_table l join user u on  u.id = l.user_id where l.is_active=false and u.company='xyz';

where lame_table is the table which has this redundant data. Even though we are on a huge instance the above query is something that executed like forever unless accompanied with a limit.

so my delete query is something like this:

delete from lame_table where id in (select l.id from lame_table l join user u on  u.id = l.user_id where l.is_active=false and u.company='xyz' limit 100000);

This unfortunately is a very slow query specially if we look at 2.4 TB worth of data in which close to 90% is lame, is there any other suggestion or out of the box solution for these kind of problems.

-Vaibhav

vaibhav
  • 3,929
  • 8
  • 45
  • 81
  • are those table have index? if so maybe try to remove those which not using in delete process and create it back after the delete process. also trigger and etc. – T. Peter Feb 18 '21 at 03:03
  • also [this](https://stackoverflow.com/questions/8290900/best-way-to-delete-millions-of-rows-by-id/8290958#8290958) might helps. – T. Peter Feb 18 '21 at 03:06
  • What is `m` in your query? – Laurenz Albe Feb 18 '21 at 03:24
  • thats u actually since i wanted to emulate the query and not put the actual query here the alias missed – vaibhav Feb 18 '21 at 03:26
  • A couple things come to mind: is there an index on `lame_table.is_active`? Also is this deletion a one-off or a recurring problem? – Marcello Romani Feb 18 '21 at 04:03
  • there is no index on is_active, type of this column is boolean, will adding a index on this column help, yes this is a one -off problem we have rectified the source of this lame data but now want to clean the accumulated mess... – vaibhav Feb 18 '21 at 04:21
  • 1
    How about just creating a new table with the data that is not lame? – jjanes Feb 18 '21 at 15:40
  • Please show the EXPLAIN for the unlimited query, and the EXPLAIN (ANALYZE, BUFFERS) for the limited query. – jjanes Feb 18 '21 at 15:41
  • 1) USE EXISTS() instaed of IN() 2) You dont need lame_table in the subquery, and you can hoist the condition to the main query. – wildplasser Feb 18 '21 at 15:49

0 Answers0