I have a micro service with a job that needs to happen only if a different server is up. for a few weeks it works great, if the server was down, the micro service sleeps a bit without doing the job (as should) and if the server was up - the job was done. the server is never down for more then a few minutes (for sure! the server is highly monitored), so the job is skipped 2-3 times tops.
Today I entered my Docker Container and noticed in the logs that the job didn't even try to continue for a few weeks now (bad choice not to monitor I know), indicating, I assume that some kind of deadlock happened. I also assume that the problem is with my Exception handling, could use some advice I work alone.
def is_server_healthy():
url = "url" #correct url for health check path
try:
res = requests.get(url)
except Exception as ex:
LOGGER.error(f"Can't health check!{ex}")
finally:
pass
return res
def init():
while True:
LOGGER.info(f"Sleeping for {SLEEP_TIME} Minutes")
time.sleep(SLEEP_TIME*ONE_MINUTE)
res = is_server_healthy()
if res.status_code == 200:
my_api.DoJob()
LOGGER.info(f"Server is: {res.text}")
else:
LOGGER.info(f"Server is down... {res.status_code}")
(The names of the variables were changed to simplify the question)
The health check is simple enough - return "up" if up. anything else considered to be down, so unless status 200 and "up" came back I consider the server to be down.