So a table which has 2.5 TB volume and this is solely because of some lame unused data accumulated over a period of time, I want to clean up this data as we are planning to migrate the hosts and would not like to take this burden on new hosts, the caveat is that whether the data is useful or not is identified using an association with other table something like this:
select id from lame_table l join user u on u.id = l.user_id where l.is_active=false and u.company='xyz';
where lame_table is the table which has this redundant data. Even though we are on a huge instance the above query is something that executed like forever unless accompanied with a limit.
so my delete query is something like this:
delete from lame_table where id in (select l.id from lame_table l join user u on u.id = l.user_id where l.is_active=false and u.company='xyz' limit 100000);
This unfortunately is a very slow query specially if we look at 2.4 TB worth of data in which close to 90% is lame, is there any other suggestion or out of the box solution for these kind of problems.
-Vaibhav