Schedule REINDEX table CONCURRENTLY job

Question

Aurora Postgress 12.6 Purpose: schedule to rebuild all indexes. what I did is create a function that calls all tables names and reindex concurrently and put the function in pg_cron but it gives me the error "SQL Error [25001]: ERROR: REINDEX CONCURRENTLY cannot be executed from a function". How can I archive the purpose?

Thanks

in postgres, you dont need to rebuild all indexes, index keeps getting update as data is inserted and deleted from table. — Uday Yadav, Aug 27 '21 at 08:03

Laurenz Albe · Accepted Answer · 2021-08-27T09:59:41.507

2

Don't do it. There is usually never a need to rebuild indexes.

You can test the indexes regularly using pgstatindex from the pgstattuple extension if you are worried.

edited Aug 27 '21 at 09:59

answered Aug 27 '21 at 07:47

Laurenz Albe

209,280
17
206
263

I am from Oracle / SQL Server background, there is a rule of thumb to keep a check of index health and rebuild/reorganize on the basis of defragmentation status. What is the case of Postgress? does it keep index healthy own its onwon? please inlight me. – user3706888 Aug 27 '21 at 09:55
1

Yes, under normal circumstances, indexes remain healthy. Check for bloat with `pgstatindex` if you are nervous. – Laurenz Albe Aug 27 '21 at 09:58
if you shouldn't do it, then why does a reindex concurrently on a 270gb table save about 60gb of space? I've got a load of tables and I've probably saved hundreds of gigabytes doing this without any drawback... so why not shove it in a scheduled job that runs every few months? – creamcheese Oct 26 '22 at 11:57
@Intellix That is a pointless exercise. You expend a lot of I/O bandwidth to reduce size a little, but after a few days or weeks of normal operation, the index will be back to the healthy, normal bloat. – Laurenz Albe Oct 26 '22 at 12:21
The official PostgreSQL [docs state](https://www.postgresql.org/docs/current/routine-reindex.html) "The potential for bloat in non-B-tree indexes has not been well researched". Personally, I've seen halting performance issues with highly updated tables over a long period of time. One way to reproduce is create a GIN index. Store 100k rows. Update the 100k rows a few times. Compare the before and after index size. To see performance impacts, update rows 1M times and compare. GIN index with full text search is a good example because it tends to be rather large. – Bufke May 05 '23 at 17:14
@Bufke Sure, you can get an index to bloat. Is what your tried something similar to your real workload? Lots of mass updates without time for autovacuum to clean up in between? – Laurenz Albe May 07 '23 at 19:16
The project I had this happen to is open source https://gitlab.com/glitchtip/glitchtip-backend It stores a tsvector with GIN index on what is essentially a searchable group. The use case involves millions of inserts a day and the group tsvector must get updated at least somewhat frequently. Thus we get a large indexed column that is updated frequently. Scheduled reindex is vital. Without up, updates get slower until they halt entirely after months. The database used was DigitalOcean managed PostgreSQL with autovacuum enabled. I'm aware there are better search tools, but it works. – Bufke May 08 '23 at 16:18
Perhaps use a scheduler outside the database. Then you won't have the problem of trying to run the command inside a transaction (database function call). – Laurenz Albe May 08 '23 at 20:17
@DimitriKopriwa That is unrelated to the question. – Laurenz Albe Aug 30 '23 at 11:23

Schedule REINDEX table CONCURRENTLY job

1 Answers1