0

The context is we are migrating a large amount of data from a single tenant to a multi-tenant, so the need for querying.

We want to fetch the data in batches so we can pass it to the destination server. The batch will be configurable by 1 or 2 lc. The query being executed is given below:

select *
from schema.table_name
order by value->>'datetime_param_in_stringformat' ASC
LIMIT 100000 OFFSET 0
  • time taken (20min)

Now this is being called from a golang app, which is running for 300mil records.

datetime_param_in_stringformat - the mentioned parameter doesn't have an index on it.

Edit 1: The main reason to use order by is to keep track of the 'datetime_param' so that in any failures in transferring the data, we can retrigger the job and be able to transfer only delta one's. This is in relation to shortening the downtime of the app as well. And to get the data in chronological order, we can only refer to time.

I tried restoring the snapshot with a higher plan and more storage, but no luck. As I can see from pg_activity, the wait event is bufFileRead. Tried to get the batch size to 50k but same issue. Or else sometimes it takes more that 15min, which is not feasible.

  • Unrelated to your problem, but: Postgres 9.3 is [no longer supported](https://www.postgresql.org/support/versioning/) you should plan an upgrade as soon as possible. –  Sep 15 '22 at 07:54
  • If your table already has a primary or unique key, order by that. And experiment with both larger and smaller batch sizes to see whether something other than 100k works best. If you [edit]ed your question to show us your table definition with indexes, we might be able to give you more specific advice – O. Jones Sep 15 '22 at 11:15
  • And, with judicious use of the COPY statement and enough drive space, you may be able to avoid the batching all together. See this. https://stackoverflow.com/questions/1517635/save-pl-pgsql-output-from-postgresql-to-a-csv-file – O. Jones Sep 15 '22 at 11:37
  • If you migrate the whole table, why do you have to use `oder by` at all? – Markus Sep 15 '22 at 14:40
  • How does your query help achieve your goal? – jjanes Sep 15 '22 at 14:47
  • Have edited the question, please do check it – OptimusPrime Sep 20 '22 at 06:11

1 Answers1

2

Your only chance is an index like

CREATE INDEX ON tab ((value->>'datetime_param_in_stringformat'));

Don't use SELECT *, select only the columns you really need. The index will be particularly helpful for a query with ORDER BY and LIMIT, but it will also speed up ORDER BY without LIMIT.

To allow concurrent data modifications while the index is being built, use CREATE INDEX CONCURRENTLY.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Since it's production DB, will there be any performance issue/downtime with this? And will the indexing be helpful when we are processing all the data? – OptimusPrime Sep 15 '22 at 09:03
  • Try it on your test database before you run it on your productive database. – Laurenz Albe Sep 15 '22 at 09:39
  • Indexing of the param is out of the question, as it has to go through lot of approvals and that's a pickle. Should I change the Impl? – OptimusPrime Sep 20 '22 at 06:14