1

This document has a good description how to prepare range-partitioned data for insertion into a target shard. If I knew the exact hash function, I could similarly prepare data for insertion into hash-distributed tables.

Such a function is hinted at here, but I could not find it where I expected it in the source.

Where does Citus determine the hash function to use during shard pruning?

Community
  • 1
  • 1
Eugen Konkov
  • 22,193
  • 17
  • 108
  • 158

2 Answers2

2

We recently released Citus 5.1. It has COPY support on hash-partitioned tables and COPY is at least one order of magnitude faster than copy_to_distributed_table (now deprecated). We will update our documentation shortly to clarify our COPY support.

You can install Citus 5.1 from either the Red Hat or Debian PGDG repositories.

jasonmp85
  • 6,749
  • 2
  • 25
  • 41
metdos
  • 13,411
  • 17
  • 77
  • 120
  • Are you saying that the function will be in the updated docs? – BillRobertson42 May 18 '16 at 11:41
  • 1
    @Bill, No. I mean, now you will not need that function because now COPY does the what the OP wants in an efficient way. – metdos May 18 '16 at 11:49
  • Yes, I am interesting to try new release, but this is not the answer for question. – Eugen Konkov May 18 '16 at 20:14
  • 2
    @EugenKonkov, we use the hash function of PostgreSQL and it changes according to the data type. From here, you can see how we get that https://github.com/citusdata/citus/blob/master/src/backend/distributed/utils/metadata_cache.c#L345 – metdos May 19 '16 at 10:30
1

The answer from metdos helps with the underlying problem (slow data migrations), but it looks like you still want a definitive answer to the original question of "Does Citus expose the hash function it uses?"

The answer to this question is "No, not directly, but it does expose the cached information about each distributed table and you can use that to discover the hash function, which you'd just need to call". What follows is a sketch of how to do that…

The function DistributedTableCacheEntry takes a table's identifier as its input and returns a struct populated with the hash function which would be used for that table.

It's a public function, and exposed by the headers installed by Citus, so you should be able to link against it to write a C-level PostgreSQL function to hash a partition value given the table it belongs in. See FastShardPruning for how to use it.

The signature would probably look like: CREATE FUNCTION citus_hash(distrel regclass, anyelement partitionval) RETURNS integer. Pseudocode:

  1. Call DistributedTableCacheEntry with distrel as argument
  2. Ensure the table is hash-partitioned
  3. Get the hash function from the cache entry
  4. Ensure partitionval is of the expected type
  5. Call the hash function on partitionval and return the result

See PostgreSQL's own documentation to learn about writing such a function.

jasonmp85
  • 6,749
  • 2
  • 25
  • 41
  • yes, I am looking for `citus_hash`. And now I see the answer: No, it is not available from SQL. Thanks for pseudocode – Eugen Konkov May 26 '16 at 06:19