6

I'm building a cloud watch alarm to send an email when a lambda function is not called in a period of 5 minutes

    CloudWatchAlarm:
     Type: AWS::CloudWatch::Alarm
     Properties:
      AlarmActions:
        - !Ref SNSTopic
      AlarmDescription: Send email if lambda function was not called within 5 minutes
      Dimensions:
        -
          Name: "FunctionName"
          Value: "my-lambda"
      ComparisonOperator:  LessThanThreshold
      EvaluationPeriods: 1
      MetricName: Invocations
      Namespace: AWS/Lambda
      Period: 300
      Statistic: Sum
      Threshold: 1
      TreatMissingData: breaching
      DatapointsToAlarm: 1

So, when the function is called, invocation metric goes to one and the alarm enter in OK state. But, when 5 minutes is passed without call the function, the alarm does not come back to ALARM state. Actually it takes like 15 minutes to go to ALARM state.

If I put a minor period, it does take less time to come back to ALARM state. I don't understand how period really works.

Does anybody know if this kind of configuration is really possible in Cloud Watch Alarm? How should I determine period and evaluation period to get the email in exactly 5 minutes ?

Thauany Moedano
  • 551
  • 2
  • 8
  • 21
  • Have you checked if your data is non-breaching? This can prevent the alarm from going into an ALARM state. – Teodorico Levoff Feb 23 '21 at 21:02
  • @TeodoricoLevoff Yes, actually there will be no datapoints and missing datapoint will be breaching. But this is not happening at the time I expect. Here an example: 1 - I invoke my function at 10:00 2 - The alarm goes into OK state because there is 1 datapoint with sum > 1 within 5 minutes 3 - I don't call my function anymore 4 - There will be no more data points (missing data is breaching) 5 - The alarm goes to ALARM state after 15 minutes instead of 5 minutes And I don't understand why it takes more time than the period to change the state – Thauany Moedano Feb 23 '21 at 22:27

1 Answers1

7

This probably happens because alarm state is not evaluated using Period, but so called evaluation range which can be much longer the the period. What's more you do not control the evaluation range.

Similar issues of CW delays were discussed in, for example:

From the link:

In this case, for the time when alarm did not transition to OK state, it was using the previous data points in the evaluation range to evaluate its state, as expected.

So it seems that in your case the evaluation range reaches 15 minutes back.

Marcin
  • 215,873
  • 14
  • 235
  • 294