0

I need help with the deleting of a lot of specific rows by ID(around 10k IDs) from big table having partitions by year.

I have run the following query to analyze execution plan with only 2 IDs:

explain analyze delete from scheme.users_daily
where user_id in (  -- user_id is not PK, can be duplicated
5791001,
7779001 
)

Actually, there should be around 10k IDs that I want to delete. I guess it`s around 1-3% of all data in the table.

And receive the following result:

"Delete on users_daily  (cost=0.00..8573700.00 rows=3627 width=6) (actual time=10271.578..10271.578 rows=0 loops=1)"
"  ->  Seq Scan on users_daily  (cost=0.00..0.00 rows=1 width=6) (actual time=0.001..0.001 rows=0 loops=1)"
"        Filter: (user_id= ANY ('{5791001,7779001}'::bigint[]))"
"  ->  Index Scan using users_daily_y2013_unq on users_daily_y2013  (cost=0.57..2351216.55 rows=1556 width=6) (actual time=2792.469..2792.469 rows=0 loops=1)"
"        Index Cond: (user_id= ANY ('{5791001,7779001}'::bigint[]))"
"  ->  Index Scan using users_daily_y2014_unq on users_daily_y2014  (cost=0.56..1745517.74 rows=694 width=6) (actual time=2158.849..2158.849 rows=0 loops=1)"
"        Index Cond: (user_id = ANY ('{5791001,7779001}'::bigint[]))"
"  ->  Index Scan using users_daily_y2015hy1_unq on users_daily_y2015hy1  (cost=0.56..1039034.28 rows=349 width=6) (actual time=1224.623..1224.623 rows=0 loops=1)"
"        Index Cond: (user_id= ANY ('{5791001,7779001}'::bigint[]))"
"  ->  Index Scan using users_daily_y2015hy2_unq on users_daily_y2015hy2  (cost=0.56..1159715.43 rows=375 width=6) (actual time=1380.513..1380.513 rows=0 loops=1)"
"        Index Cond: (user_id= ANY ('{5791001,7779001}'::bigint[]))"
"  ->  Index Scan using users_daily_y2016_unq on users_daily_y2016  (cost=0.57..2278216.00 rows=652 width=6) (actual time=2715.106..2715.106 rows=0 loops=1)"
"        Index Cond: (user_id= ANY ('{5791001,7779001}'::bigint[]))"
"Planning time: 0.364 ms"

Then I check how much rows are in current table and partitions:

explain analyze select count(id)
from scheme.users_daily

Response:

"Aggregate  (cost=9735664.99..9735665.00 rows=1 width=8) (actual time=1005691.857..1005691.858 rows=1 loops=1)"
"  ->  Append  (cost=0.00..9032279.19 rows=281354320 width=8) (actual time=1.967..797587.379 rows=268084881 loops=1)"
"        ->  Seq Scan on users_daily  (cost=0.00..0.00 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=1)"
"        ->  Seq Scan on users_daily_y2013  (cost=0.00..2489084.32 rows=77247432 width=8) (actual time=1.957..134821.451 rows=71962946 loops=1)"
"        ->  Seq Scan on users_daily_y2014  (cost=0.00..1848731.32 rows=57374432 width=8) (actual time=1.256..72915.919 rows=54835860 loops=1)"
"        ->  Seq Scan on users_daily_y2015hy1  (cost=0.00..1094813.48 rows=34156048 width=8) (actual time=0.735..40167.938 rows=32796038 loops=1)"
"        ->  Seq Scan on users_daily_y2015hy2  (cost=0.00..1219911.76 rows=38124076 width=8) (actual time=0.801..43994.913 rows=36669043 loops=1)"
"        ->  Seq Scan on users_daily_y2016  (cost=0.00..2379738.31 rows=74452331 width=8) (actual time=0.009..109012.004 rows=71820994 loops=1)"
"Planning time: 0.688 ms"
"Execution time: 1005691.977 ms"

How can I solve this performance issue with deleting a lot of rows from the big table?

Viktor M.
  • 4,393
  • 9
  • 40
  • 71
  • 1
    So where exactly is the performance problem? Are the 10 seconds to delete the rows too much? How fast do you need that to be? The execution plan does not seem to match your query. The plan shows a condition on the column `account_id` your sample query uses `user_id` –  Nov 23 '16 at 09:49
  • Could you add the index definition to your question? – Nick Barnes Nov 23 '16 at 10:31
  • a_horse_with_no_name, 2 ids = ~3600 rows = 10 seconds, I need to delete about 10 000 IDs. So it might take about 833 hours to delete all of them. So I'm interesting in any solution to execute it more faster if it`s possible of course. – Viktor M. Nov 23 '16 at 10:40
  • Nick Barnes, no indexes are used. Adding any indexes will take to much time in such situation if I understand your question correctly. – Viktor M. Nov 23 '16 at 10:41
  • Have you looked at http://stackoverflow.com/questions/8290900/best-way-to-delete-millions-of-rows-by-id ? Try the `DELETE ... FROM ... USING` form – Asoub Nov 23 '16 at 10:47
  • @user1376885: if you need to delete 10K ids, don't use the `IN( id, id2, id3, ..., id10000)` construct. Instead: put them into a (temp) table and index this table on id, and do a `delete ... where exists(...)`. – wildplasser Nov 23 '16 at 12:48
  • It won't take 833 hours to delete 10000 ids. Those 10 seconds include also the various index scans of all the partitions. The delete operation took ~0ms. – pietrop Nov 25 '16 at 06:38

0 Answers0