I have a problem with my implementation of Opencensus, logging in Python and FastAPI. I want to log incomming requests to Application Insights in Azure, so I added a FastAPI middleware to my code following the Microsoft docs and this Github post:
propagator = TraceContextPropagator()
@app.middleware('http')
async def middleware_opencensus(request: Request, call_next):
tracer = Tracer(
span_context=propagator.from_headers(request.headers),
exporter=AzureExporter(connection_string=os.environ['APPLICATION_INSIGHTS_CONNECTION_STRING']),
sampler=AlwaysOnSampler(),
propagator=propagator)
with tracer.span('main') as span:
span.span_kind = SpanKind.SERVER
tracer.add_attribute_to_current_span(HTTP_HOST, request.url.hostname)
tracer.add_attribute_to_current_span(HTTP_METHOD, request.method)
tracer.add_attribute_to_current_span(HTTP_PATH, request.url.path)
tracer.add_attribute_to_current_span(HTTP_ROUTE, request.url.path)
tracer.add_attribute_to_current_span(HTTP_URL, str(request.url))
response = await call_next(request)
tracer.add_attribute_to_current_span(HTTP_STATUS_CODE, response.status_code)
return response
This works great when running local, and all incomming requests to the api are logged to Application Insights. Since having Opencensus implemented however, when deployed in a Container Instance on Azure, after a couple of days (approximately 3) an issue arises where it looks like some recursive logging issue happens (+30.000 logs per second!), i.a. stating Queue is full. Dropping telemetry
, before finally crashing after a few hours of mad logging:
Our logger.py
file where we define our logging handlers is as follows:
import logging.config
import os
import tqdm
from pathlib import Path
from opencensus.ext.azure.log_exporter import AzureLogHandler
class TqdmLoggingHandler(logging.Handler):
"""
Class for enabling logging during a process with a tqdm progress bar.
Using this handler logs will be put above the progress bar, pushing the
process bar down instead of replacing it.
"""
def __init__(self, level=logging.NOTSET):
super().__init__(level)
self.formatter = logging.Formatter(fmt='%(asctime)s <%(name)s> %(levelname)s: %(message)s',
datefmt='%d-%m-%Y %H:%M:%S')
def emit(self, record):
try:
msg = self.format(record)
tqdm.tqdm.write(msg)
self.flush()
except (KeyboardInterrupt, SystemExit):
raise
except:
self.handleError(record)
logging_conf_path = Path(__file__).parent
logging.config.fileConfig(logging_conf_path / 'logging.conf')
logger = logging.getLogger(__name__)
logger.addHandler(TqdmLoggingHandler(logging.DEBUG)) # Add tqdm handler to root logger to replace the stream handler
if os.getenv('APPLICATION_INSIGHTS_CONNECTION_STRING'):
logger.addHandler(AzureLogHandler(connection_string=os.environ['APPLICATION_INSIGHTS_CONNECTION_STRING']))
warning_level_loggers = ['urllib3', 'requests']
for lgr in warning_level_loggers:
logging.getLogger(lgr).setLevel(logging.WARNING)
Does anyone have any idea on what could be the cause of this issue, or have people encountered similar issues? I don't know what the 'first' error log is due to the fast amount of logging.
Please let me know if additional information is required.
Thanks in advance!