0

I have a micro service with a job that needs to happen only if a different server is up. for a few weeks it works great, if the server was down, the micro service sleeps a bit without doing the job (as should) and if the server was up - the job was done. the server is never down for more then a few minutes (for sure! the server is highly monitored), so the job is skipped 2-3 times tops.

Today I entered my Docker Container and noticed in the logs that the job didn't even try to continue for a few weeks now (bad choice not to monitor I know), indicating, I assume that some kind of deadlock happened. I also assume that the problem is with my Exception handling, could use some advice I work alone.

def is_server_healthy():
    url = "url" #correct url for health check path
    try:
        res = requests.get(url)
    except Exception as ex:
        LOGGER.error(f"Can't health check!{ex}")
    finally:
        pass

    return res

def init():
    while True:
        LOGGER.info(f"Sleeping for {SLEEP_TIME} Minutes")
        time.sleep(SLEEP_TIME*ONE_MINUTE)

        res = is_server_healthy()

        if res.status_code == 200:
            my_api.DoJob()
            LOGGER.info(f"Server is: {res.text}")
        else:
            LOGGER.info(f"Server is down... {res.status_code}")

(The names of the variables were changed to simplify the question)

The health check is simple enough - return "up" if up. anything else considered to be down, so unless status 200 and "up" came back I consider the server to be down.

lolu
  • 370
  • 4
  • 20
  • 2
    shouldn't you indent the `return res` more? – alex Mar 26 '20 at 13:12
  • @alex yes, I did't copy well. please ignore ill edit – lolu Mar 26 '20 at 13:13
  • 1
    Shouldn't you cron-tab a single execution every minute of your check script without the `while True:` loop? Or try some of the other methods in [what-is-the-best-way-to-repeatedly-execute-a-function](https://stackoverflow.com/questions/474528/what-is-the-best-way-to-repeatedly-execute-a-function-every-x-seconds) – Patrick Artner Mar 26 '20 at 13:13
  • 1
    The only line that can lead to an unhandled exception is `my_api.DoJob()`. All the other lines won't cause an exception and the lines that could are handled by a try except. – Tin Nguyen Mar 26 '20 at 13:18
  • 1
    @TinNguyes - there are. See answer below. – Patrick Artner Mar 26 '20 at 13:42

1 Answers1

2

In case your server is down you get a non-captured error:

NameError: name 'res' is not defined

Why? See:

def is_server_healthy():
    url = "don't care"
    try:
        raise Exception()  # simulate fail
    except Exception as ex:
        print(f"Can't health check!{ex}")
    finally:
        pass

    return res   ## name is not known ;o)

res = is_server_healthy()
if res.status_code == 200:   # here, next exception bound to happen
    my_api.DoJob()
    LOGGER.info(f"Server is: {res.text}")
else:
    LOGGER.info(f"Server is down... {res.status_code}")

Even if you declared the name, it would try to access some attribute thats not there:

if res.status_code == 200:   # here - object has no attribute 'status_code'   
    my_api.DoJob()
    LOGGER.info(f"Server is: {res.text}")
else:
    LOGGER.info(f"Server is down... {res.status_code}")

would try to access a member thats simply not there => Exception, and process gone.


You are probably better off using some system-specific way to call your script once every minute (Cron Jobs, Task Scheduler) then idling in a while True: with sleep.

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69
  • I think I just realise what you said - I changed that line to be: if hasattr(res, "status_code") and res.status_code == 200: – lolu Mar 26 '20 at 13:23