3

I have huge amount of data flowing from Eventhub to Azure Data Explorer. Currently we have not done any modification on the batching policy, so it is scheduling every 5 minutes. But we need to reduce it to a less value so that the end to end lag is reduced.

How can I calculate the ideal batching time for this setup. Is there any calculation based on the CPU of ADX and the Data ingestion on Eventhub , so that I can figure out an ideal time without affecting the CPU usage of ADX

Justin Mathew
  • 950
  • 8
  • 34

2 Answers2

1

There is no tool or other functionality that allows you to do it today, you will need to try the desired setting for "MaximumBatchingTimeSpan" and observe the impact on the CPU usage.

Avnera
  • 7,088
  • 9
  • 14
  • Can't do this experiment in prod. Thinking of adding another routing from Eventhub to another ADX and load test on it, if all good make the change on the primary DB. – Justin Mathew Jun 09 '21 at 03:40
  • 1
    As you can see from Vladik comments below, first ensure that the batching policy batches the ingestion based on time (and not size or number of items) by looking at the metrics. If this is the case, you can do the changes to the ingestion batching gradually. If you do not have many tables (more than a 100) the impact should be low. – Avnera Jun 09 '21 at 07:55
1

Essentially, if you are ingesting huge volumes of data (per table), you are probably not using the 5 minutes batching window, or can decrease it significantly without detrimental impact. Please have a look at the latency and batching metrics for your cluster (https://learn.microsoft.com/en-us/azure/data-explorer/using-metrics#ingestion-metrics) and see a) if your actual latency is below 5 minutes - which would indicate the batching is not driven by time, and b) what is the "Batching type" that your cluster most often enacts - time/size/number of items. Based on these numbers you can tweak down the time component of your ingestion batching policy.

Vladik Branevich
  • 1,180
  • 8
  • 11