Am using a Kusto query to create a timechart within Azure AppInsights, to visualize when our webservice is within its SLO (and when it isn't) using one of Google's examples of measuring if a webservice is within its error budget:
SLI = The proportion of sufficiently fast requests, as measured from the load balancer metrics. “Sufficiently fast” is defined as < 400 ms.
SLO = 90% of requests < 400 ms
Measured as:
count of http_requests with a duration less than or equal to "0.4" seconds
divided by count of all http_requests
Assuming 10-minute inspection intervals over a 7-day window, here is my code:
let fastResponseTimeMaxMs = 400.0;
let errorBudgetThresholdForFastResponseTime = 90.0;
//
let startTime = ago(7days);
let endTime = now();
let timeStep = 10m;
//
let timeRange = range InspectionTime from startTime to endTime step timeStep;
timeRange
| extend RespTimeMax_ms = fastResponseTimeMaxMs
| extend ActualCount = toscalar
(
requests
| where timestamp > InspectionTime - timeStep
| where timestamp <= InspectionTime
| where success == "True"
| where duration <= fastResponseTimeMaxMs
| count
)
| extend TotalCount = toscalar
(
requests
| where timestamp > InspectionTime - timeStep
| where timestamp <= InspectionTime
| where success == "True"
| count
)
| extend Percentage = round(todecimal(ActualCount * 100) / todecimal(TotalCount), 2)
| extend ErrorBudgetMinPercent = errorBudgetThresholdForFastResponseTime
| extend InBudget = case(Percentage >= ErrorBudgetMinPercent, 1, 0)
Sample query output of what I wish to achieve:
InspectionTime [UTC] RespTimeMax_ms ActualCount TotalCount Percentage ErrorBudgetMinPercent InBudget
2019-05-23T21:53:17.894 400 8,098 8,138 99.51 90 1
2019-05-23T22:03:17.894 400 8,197 9,184 89.14 90 0
2019-05-23T22:13:17.894 400 8,002 8,555 93.54 90 1
The error I'm getting is:
'where' operator: Failed to resolve scalar expression named 'InspectionTime'
I've tried todatetime(InspectionTime)
, fails with same error.
Replacing InspectionTime
with other objects of type datetime
gets this code to execute OK, but not with the datetime values that I want. By example, using this snippet executes OK, when used within my code sample above:
| extend ActualCount = toscalar
(
requests
| where timestamp > startTime // instead of 'InspectionTime - timeStep'
| where timestamp <= endTime // instead of 'InspectionTime'
| where duration <= fastResponseTimeMaxMs
| count
)
To me it seems that using InspectionTime
within toscalar(...)
is the crux of this problem, since I'm able to use InspectionTime
within similar queries using range(...)
that don't nest it within toscalar(...)
.
Note: I don't want a timechart chart of request.duration
, since that doesn't tell me if the count of requests above my threshold (400ms) exceed our error budget according to the formula defined above.