3

I have logging done on sumologic. The log JSON contains the response time of the request. Let it be a JSON key whose name is "response_time". Each request is identified by unique ID , denoted by JSON key "request_id". and a URL denoted by JSON key "url". I need to alert on a slack channel based on the following condition.

1) In a window of 10 minutes, If there are 100 requests, and if more than 5 % of requests have response time more than 100ms, then alert the "url", "request_id" and "response_time" of the all those requests. 2) If Less than Or Equal 5 % of requests have response time more than 100ms, then don't alert at all. I wrote a query like this.

_sourceName=<my_source_name> 
| json field=_raw "response_time" as response_time 
| json field=_raw "request_id" as request_id 
| if (num(response_time) > 100, 1, 0) as higher 
| if (num(response_time) <= 100, 1, 0) as lower 
| count as total_requests, sum(higher) as 
response_time_greater_than_100, sum(lower) as 
response_time_less_than_100 
| (response_time_greater_than_100/total_requests) as failure_ratio 
| where (failure_ratio > 0.05)

Above query gives me all the requests when more than 5% of requests have response_time more than 100 ms. But It gives me all requests irrespective of response time. No results are returned otherwise.

Along with this result, I want to filter above query further with requests having "response_time" > 100 ms. Whenever there are results, it gives two tabs. One for "Messages" and another for "Aggregates". and I want to send the fields in “Messages” tab to a slack channel. How to achieve this ?

user9920500
  • 606
  • 7
  • 21

1 Answers1

3

Tabs - Aggregates vs. Messages

First, let's clarify these two tabs. The first one (Message) contains all these original log lines which made the result. The second one (Aggregates) is the result of your actual query with grouping. Notice you are using | count which is a grouping operator (similar to GROUP BY in SQL).

Any outgoing interactions always base on the actual result of the query (Aggregates). The raw lines are only visible in the user interface for inspection (also visible in API).

Actual query

If you just wanted to be fetch all requests with response time >100, it would be enough to have a query like this:

_sourceName=<my_source_name> 
| json field=_raw "response_time" as response_time 
| json field=_raw "request_id" as request_id 
| where response_time > 100

Speaking declaratively, I understand you want something different: get all responses above 100 but only if requests above 100 constitute >5% of total requests, else an empty result set.

_sourceName=<my_source_name> 
| 1 as expected_failure_ratio_violation
| where [subquery:
  _sourceName=<my_source_name> 
  | json field=_raw "response_time" as response_time 
  | json field=_raw "request_id" as request_id
  | if (num(response_time) > 100, 1, 0) as higher 
  | if (num(response_time) <= 100, 1, 0) as lower 
  | count as total_requests, sum(higher) as response_time_greater_than_100, 
    sum(lower) as response_time_less_than_100 
  | (response_time_greater_than_100/total_requests) as failure_ratio 
  | where (failure_ratio > 0.05)
  | count as expected_failure_ratio_violation 
  | compose expected_failure_ratio_violation        
]
| json field=_raw "response_time" as response_time 
| json field=_raw "request_id" as request_id
| where response_time > 100

It uses a trick of matching (a constant) 1 with a count of violations in subquery (expected_failure_ratio_violation).

Also, as a hint - you are not using | timeslice here, which in my experience is what people typically use in scenarios like this. You might want to take a look at it.

Disclaimer: I am currently employed by Sumo Logic

Grzegorz Oledzki
  • 23,614
  • 16
  • 68
  • 106
  • 1) this query uses a subquery. and I want to take the results of subquery into a slack alert as a real time. and Real time alerts are not possible with subqueries. Is there a way to avoid subquery but still achieve the results ?? – user9920500 Jun 06 '19 at 07:18
  • also If not real time alert I want to run this scheduled search. the least `Run Frequency` is 15 Minutes. Is there a way that I can reduce to 10 minutes ?? – user9920500 Jun 06 '19 at 07:24
  • Next frequency after Real Time Alerts is 15 minutes. – Grzegorz Oledzki Jun 06 '19 at 09:25
  • ` if (num(response_time) > 100, 1, 0)` with ` sum(higher)' is a great trick. If it is not available in Sumo logic documentation examples, consider to add it. – Michael Freidgeim Jan 30 '23 at 04:46