10

I've recently deployed my Django API backend to AWS EB to their Linux 2 system (exact platform name is Python 3.7 running on 64bit Amazon Linux 2).

Almost everything is working as expected, but my application health status is Severe and after hours of debugging I've no idea why.

The application's health check is being handled using the following endpoint (django-health-check module).

url(r'^ht/', include('health_check.urls'))

100% of the requests have a status code of 200 but my overall health status is the following:

|--------------------|----------------|---------------------------------------------------|
|   instance-id      |   status       |   cause                                           |
|--------------------|----------------|---------------------------------------------------|
|   Overall          |   Degraded     |   Impaired services on all instances.             |
|   i-0eb89f...      |   Severe       |   Following services are not running: release.    |
|--------------------|----------------|---------------------------------------------------|

The strangest thing is the fact that the message Following services are not running: release. is unique to the internet (seems like no one has had such problem before).

The other weird thing are the contents of my /var/log/healthd/daemon.log file which are lines similar to

W, [2020-07-21T09:00:01.209091 #3467]  WARN -- : log file "/var/log/nginx/healthd/application.log.2020-07-21-09" does not exist

where the time changes.

The last thing that may be relevant are the contents of my single file inside .ebextensions directory:

option_settings:
  "aws:elasticbeanstalk:application:environment":
    DJANGO_SETTINGS_MODULE: "app.settings"
    "PYTHONPATH": "/var/app/current:$PYTHONPATH"
  "aws:elasticbeanstalk:container:python":
    WSGIPath: app.wsgi:application
    NumProcesses: 3
    NumThreads: 20
  aws:elasticbeanstalk:environment:proxy:staticfiles:
    /static: static
    /static_files: static_files
container_commands:
  01_migrate:
    command: "source /var/app/venv/staging-LQM1lest/bin/activate && python manage.py migrate --noinput"
    leader_only: true
packages:
  yum:
    git: []
    postgresql-devel: []

Does anyone have any idea how can this be resolved? The ultimate goal is to have the green OK health.


EDIT: In the end I switched to the Basic health system and the problems suddenly went away. I am however still interested in solving the original problem as the Enhanced health system provides some benefits

Philip Fabianek
  • 125
  • 4
  • 9
  • Is it load-balanced environment? – Marcin Jul 21 '20 at 09:57
  • @Marcin Yes it is, I also forgot to mention that I am making use of `Enhanced health reporting and monitoring` – Philip Fabianek Jul 21 '20 at 10:11
  • The application works exactly as expected? Logs such as /var/log/cloud-init-cmd don't show any errors? – Marcin Jul 21 '20 at 10:13
  • @Marcin I meant that endpoints as well as database are all working just fine. I downloaded full logs from EB console and went through them. All I found were 2 errors from eb-engine.log that didn't seem relevant (`[ERROR] nginx: the configuration file /var/proxy/staging/nginx/nginx.conf syntax is ok nginx: configuration file /var/proxy/staging/nginx/nginx.conf test is successful` and `[ERROR] Created symlink from /etc/systemd/system/multi-user.target.wants/worker.service to /etc/systemd/system/worker.service.`). A file called `cloud-init-cmd` isn't even part of the logs. – Philip Fabianek Jul 21 '20 at 10:58
  • @Marcin I also switched to `Basic` health system (instead of `Enhanced`) and the problems disappeared. I'm however still interested in solving the original problem. – Philip Fabianek Jul 21 '20 at 11:00
  • The file should be `/var/log/cfn-init-cmd.log`, sorry. – Marcin Jul 21 '20 at 11:05
  • @Marcin No, the file doesn't contain a single error – Philip Fabianek Jul 21 '20 at 11:13
  • Don't know why you got errors. Basic health reporting is only based on LB an EC2 health checks. Enhanced also monitors your logs and other metrics from the inside of the instances. Thus maybe it was finding some errors there, and reporting them as health issues. – Marcin Jul 21 '20 at 11:42

2 Answers2

7

I believe that the problem you have maybe due to the ALLOWED_HOSTS settings located in your file settings.py.

EB sends an HTTP request to your application to see if its working but Django blocks any communication that is not from the specified hosts in the setting variable. But there is a problem here, EB sends the request to the private ip of the ec2 instance.

The easiest way to solve this is to allow all HOSTS like this inside your settings.py file:

ALLOWED_HOSTS=['*']

This may lead to security issues but is the fastest way to do it. Now, to make it work dynamically, since ec2 instances can be spin-up at any time the private ip change from instance to instance.

To solve this you have to get the private IP at the beginning of the deployment process.

At the top of your settings.py place the following functions:

import os
import requests
# Other imports ...

def is_ec2_linux():
"""Detect if we are running on an EC2 Linux Instance
   See http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/identify_ec2_instances.html
"""
    if os.path.isfile("/sys/hypervisor/uuid"):
        with open("/sys/hypervisor/uuid") as f:
            uuid = f.read()
            return uuid.startswith("ec2")
    return False

def get_token():
"""Set the autorization token to live for 6 hours (maximum)"""
    headers = {
        'X-aws-ec2-metadata-token-ttl-seconds': '21600',
    }
    response = requests.put('http://169.254.169.254/latest/api/token', headers=headers)
    return response.text


def get_linux_ec2_private_ip():
    """Get the private IP Address of the machine if running on an EC2 linux server.
See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html"""

    if not is_ec2_linux():
        return None
    try:
        token = get_token()
        headers = {
            'X-aws-ec2-metadata-token': f"{token}",
        }
        response = requests.get('http://169.254.169.254/latest/meta-data/local-ipv4', headers=headers)
        return response.text
    except:
        return None
    finally:
        if response:
            response.close()
# Other settings

The most important functions are get_token() and get_linux_ec2_private_ip(), the first one sets the access token and retrieves it for the second one to use it and get the current ec2 instance IP.

Once you have retrieved it, add it to your ALLOWED_HOSTS

ALLOWED_HOSTS = ['127.0.0.1', 'mywebsite.com']
private_ip = get_linux_ec2_private_ip()
if private_ip:
   ALLOWED_HOSTS.append(private_ip)

After that just commit your changes and redeploy it with eb deploy if you have set the EB CLI.

  • What's that hardcoded ip `169.254.169.254` ? – DataGreed Feb 23 '22 at 22:34
  • 2
    It seems to be a special reserved IP address. In AWS EC2 it is used to distribute metadata between instances. You can take a look at it here https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html – Sebastian Escalante Feb 26 '22 at 22:42
3

There is a package django-ebhealthcheck that's designed to solve this problem by getting the local ip of your ec2 instance and adding it to ALLOWED_HOSTS, it's very simple to use, you just have to add 'ebhealthcheck.apps.EBHealthCheckConfig' to INSTALLED_APPS

Package github page – https://github.com/sjkingo/django-ebhealthcheck

DataGreed
  • 13,245
  • 8
  • 45
  • 64