2

Is there a way to know which partition will be used for a new row in a table partitioned by a column with text values?

Use case is: I have a big table with many partitions according to a text column. I need to insert-update millions of new rows to this table. If I know which partitions will be used for each row, I can rearrange my bulk inserts-updates to only work on 1 partition at a time.

-> hashtext function is tested and it is not giving the correct partition value.

Mertcan Çüçen
  • 466
  • 3
  • 13
  • 1
    The only reason for hash partitioning is to distribute I/O, so I don't see why you want to avoid that. – Laurenz Albe Apr 20 '21 at 16:15
  • 1
    The linked post doesn't really answer the question. 'b' goes to partition with modulus 8 remainder 6, but how does `hashtext('b')=-68,821,088` explain that? – jjanes Apr 21 '21 at 00:51
  • @LaurenzAlbe Surely you might want to distribute IO sometimes (like normal operations), but not others (like bulk loading, where keeping the partition indexes cached in RAM can be very important). If all you wanted to do is distribute IO unconditionally and uncontrollably, why partition at all? Just use RAID/JBOD. – jjanes Apr 21 '21 at 00:56
  • @jjanes True; `hashtext` is not the answer. Can there be a good reason for hash partitioning except spreading I/O? – Laurenz Albe Apr 21 '21 at 03:40
  • Thank you @jjanes. Whoever made this question get closed deserves a punishment and you deserve thanks. People are so eager to tag questions and close them as if I didnt see that related question. Quite strange behaviors. – Mertcan Çüçen Apr 21 '21 at 23:25
  • @LaurenzAlbe if I try to insert(many times upsert) to all partitions at the same time, all partitions suffer from locks because I read from that table a lot during the time. Plus, autovacuum is hard on unpartitioned version. However, I thought If I can run parallel inserts only on some of the partitions, I can make insertion process faster and reads from the table doesnt suffer that much. Is it a feasible solution? – Mertcan Çüçen Apr 21 '21 at 23:29
  • 1
    The locks taken by `INSERT` and `SELECT` don't conflict, so I see no problem here. – Laurenz Albe Apr 22 '21 at 02:19
  • 4b rows are available and daily 150m rows get updated, 30m rows get inserted. Updates are problematic. When it is partitioned, I can run parallel updates-inserts to partitions separately as much as my cpu cores allow. – Mertcan Çüçen Apr 22 '21 at 15:07

1 Answers1

6

Reverse engineering the code, you can get the partition number with the following statement:

SELECT (hashtextextended('value', 8816678312871386365)::numeric + 5305509591434766563) % 8;

Replace 8 with the number of partitions and 'value' with the string in question.

You can test the partition number with satisfies_hash_partition. To test if 'value' would end up in partition 6 of 8 in table tab, you can run

SELECT satisfies_hash_partition('tab'::regclass, 8, 6, 'value'::text);
Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263