Azure Data Explorer: How do Partitioning Policy and Merge Policy work?

Question

In our ADX cluster there is no partitioning policy and no merge policy on a table, but the adx still creates extents. I am confused how this works and what the default settings are. Does anyone know this?

Further, how do a combination of partition keys work? For example I have

{
  "PartitionKeys": [
    {
      "ColumnName": "tenant_id",
      "Kind": "Hash",
      "Properties": {
        "Function": "XxHash64",
        "MaxPartitionCount": 128,
        "Seed": 1,
        "PartitionAssignmentMode": "Uniform"
      }
    },
    {
      "ColumnName": "timestamp",
      "Kind": "UniformRange",
      "Properties": {
        "Reference": "2021-01-01T00:00:00",
        "RangeSize": "7.00:00:00",
        "OverrideCreationTime": false
      }
    }
  ]
}

This will create on every new tenant_id a partition within the next 7 days? But a limit is 128? Or how should I read this?

And what is the benefit of building this small extents based on partition policy when there is a merge policy which merge the small extents to a bigger one? Why not building a bigger one instant?

Thanks

what i did: searching docs and try to goole

score 1 · Accepted Answer · answered May 04 '23 at 17:56

In our ADX cluster there is no partitioning policy and no merge policy on a table, but the adx still creates extents

if you ingest data, extents will be created (either immediately - if you use batch ingestion - or eventually - if you use streaming ingestion).

a partitioning policy ('null' by default, rarely required to define it) will change how extents are partitioned, and a merge policy (defined by default, rarely required to change it) impacts how extents are merged.

how do a combination of partition keys work? This will create on every new tenant_id a partition within the next 7 days? But a limit is 128? Or how should I read this?

given the policy you included, extents in the table will be partitioned as follows:

all records for which the result of hash_xxhash64(tenant_id, 128) has the same value (a value between 0 and 127) and for which the result of bin_at(timestamp, 7d, datetime(2021-01-01T00:00:00)) has the same value - will be included in the same set of extents, and will have the same partition metadata.
afterwards, extents that have the same partition metadata (for both partition keys) may get merged together, until they reach optimum size (managed by the system). extents that have different partition metadata (for either partition key) can't be merged.

what is the benefit of building this small extents based on partition policy when there is a merge policy which merge the small extents to a bigger one? Why not building a bigger one instant?

I would recommend you go over the following posts/documents:

Thank you. What happen when I set the Partition Key to 7d bin size but in the merge policy the MaxRangeInHours is only 96h or so. Why does it not work? — FakieKickflip, May 05 '23 at 06:00
MaxRangeInHours prevents data shards that are 'too distant' from each other according to their creation times from being merged. if your RangeSize (7d=128h) is larger than the MaxRangeInHours (96h), this would result with data shards in the same datetime partition not being merged together — Yoni L., May 05 '23 at 14:51

Azure Data Explorer: How do Partitioning Policy and Merge Policy work?

1 Answers1