6

I have a Java application that runs in AWS Elastic Container Service. Application polls a queue periodically. Sometimes there is no response from the queue and the application hanging forever. I have enclosed the methods with try-catch blocks with logging exceptions. Even though there are no logs in the Cloudwatch after that. No exceptions or errors. Is there a way that I can identify this situation. ? (No logs in the Cloudwatch). Like filtering an error log pattern. So I can restart the service. Any trick or solution would be appreciated.

public void handleProcess() {
    try {
        while(true) {
            Response response = QueueUitils.pollQueue(); // poll the queue
            QueueUitils.processMessage(response);
            TimeUnit.SECONDS.sleep(WAIT_TIME); // WAIT_TIME = 20
        }
    } catch (Exception e) {
        LOGGER.error("Data Queue operation failed" + e.getMessage());
        throw e;
    }
}

3 Answers3

9

You can do this with CloudWatch Alarms. I've set up a test Lambda function for this which runs every minute and logs to CloudWatch.

  1. Go to CloudWatch and Click Alarms in the left hand side menu
  2. Click the orange Create Alarm button Create Alarm
  3. Click Select Metric Select Metric
  4. Then choose Logs, then Log Group Metrics and choose the IncomingLogEvents metric for the relevant log group (the log group to which your application is logging). In my case it's /aws/lambda/test-log-silence Select Log Group Metric
  5. Click Select Metric
  6. Now you can specify how you want to measure the metric. I've chosen the average log entries over 5 minutes, so after 5 minutes if there are no log entries, that value would be zero. Specify Metric Measurements
  7. Scroll down, and you set the check to be "Lower Than or Equal To" zero. This will trigger the alarm when there are no log entries for 5 minutes (or whatever you decide to set it to). Specify Conditions
  8. Now click next, and you can specify an SNS topic to push the notification to. You can set up an SNS topic to notify you via email, SMS, AWS Lambda, and others.
brads3290
  • 1,926
  • 1
  • 14
  • 20
  • Sorry to hack into this question. I have a similar requirement where I have to check if there is no log message for 6 hrs period then i have to trigger the alarm state. I have implemented the same configuration as above. But still the behavior is not consistent. Its not going in alarm state even when above 6 hrs. It seems like it may be going insufficient data when no log is coming. Need some inputs in here. – Swapnil Mhaske Mar 18 '21 at 14:30
  • 1
    @SwapnilMhaske When you are creating the alarm there is Additional configuration in the bottom of the page. Choose treat missing data is good or bad as your requirement. Default setting is treat missing data as missing. That is why it is going to insufficient state. – Ashan Tharindu Mar 23 '21 at 09:54
4

With reference to brads3290's answer, if you are using AWS CDK:

import * as cloudwatch from '@aws-cdk/aws-cloudwatch'; 
// ...
const metric = new cloudwatch.Metric({
      namespace: 'AWS/Logs',
      metricName: 'IncomingLogEvents',
      dimensions: { LogGroupName: '/aws/lambda/test-log-silence' },
      statistic: "Average",
      period: cdk.Duration.minutes(5),
    });

const alarm = new cloudwatch.Alarm(this, 'Alarm', {
      metric,
      threshold: 0,
      comparisonOperator: cloudwatch.ComparisonOperator.LESS_THAN_OR_EQUAL_TO_THRESHOLD,
      evaluationPeriods: 1,
      datapointsToAlarm: 1,
      treatMissingData: cloudwatch.TreatMissingData.BREACHING,
    });

This should also solve the problem of ignoring missing data.

sompnd
  • 561
  • 5
  • 7
0

In my case, I needed to use dimensionsMap{} instead of just dimensions: {}

const metric = new cloudwatch.Metric({
  namespace: 'AWS/Logs',
  metricName: 'IncomingLogEvents',
  dimensionsMap: {
    "LogGroupName": "logGroupNamehere.."
  },
  statistic: "Sum",
  period: cdk.Duration.days(1),
});

And the Alarm looks like:

new cloudwatch.Alarm(this, 'no-incoming-logs-alarm', {
  metric,
  alarmName: `incoming-logs-alarm-${props?.stage}`,
  threshold: 1,
  comparisonOperator: cloudwatch.ComparisonOperator.LESS_THAN_THRESHOLD,
  evaluationPeriods: 1,
  datapointsToAlarm: 1,
  treatMissingData: cloudwatch.TreatMissingData.MISSING,
  alarmDescription: 'Some meaningful description',
});
Abu
  • 11
  • 1
  • 3