I have a Django web app served from Apache2 with mod_wsgi in docker containers running on a Kubernetes cluster in Google Cloud Platform, protected by Identity-Aware Proxy. Everything is working great, but I want to send GCP Stackdriver traces for all requests without writing one for each view in my project. I found middleware to handle this, using Opencensus. I went through this documentation, and was able to manually generate traces that exported to Stackdriver Trace in my project by specifying the StackdriverExporter
and passing the project_id
parameter as the Google Cloud Platform Project Number
for my project.
Now to make this automatic for ALL requests, I followed the instructions to set up the middleware. In settings.py, I added the module to INSTALLED_APPS
, MIDDLEWARE
, and set up the OPENCENSUS_TRACE
dictionary of options. I also added the OPENCENSUS_TRACE_PARAMS
. This works great with the default exporter 'opencensus.trace.exporters.print_exporter.PrintExporter', as I can see the Trace and Span information, including Trace ID and all details in my Apache2 web server logs. However, I want to send these to my Stackdriver Trace processor for analysis.
I tried setting the EXPORTER
parameter to opencensus.trace.exporters.stackdriver_exporter.StackdriverExporter
, which works when run manually from the shell, as long as you supply the project number.
When it is set up to use StackdriverExporter
, the web page will not respond load, the health check starts to fail, and ultimately the web page comes back with a 502 error, stating I should try again in 30 seconds (I believe the Identity-Aware Proxy is generating this error, once it detects the failed health check), but the server generates no errors, and there are no logs in access or errors for Apache2.
There is another dictionary in settings.py named OPENCENSUS_TRACE_PARAMS
, which I presume is needed to determine which project number the exporter should be using. The example has GCP_EXPORTER_PROJECT
set as None
, and SERVICE_NAME
set as 'my_service'
.
What options do I need to set to get the exporter to send back to Stackdriver instead of printing to logs? Do you have any idea about how I can set this up?
settings.py
MIDDLEWARE = (
...
'opencensus.trace.ext.django.middleware.OpencensusMiddleware',
)
INSTALLED_APPS = (
...
'opencensus.trace.ext.django',
)
OPENCENSUS_TRACE = {
'SAMPLER': 'opencensus.trace.samplers.probability.ProbabilitySampler',
'EXPORTER': 'opencensus.trace.exporters.stackdriver_exporter.StackdriverExporter', # This one just makes the server hang with no response or error and kills the health check.
'PROPAGATOR': 'opencensus.trace.propagation.google_cloud_format.GoogleCloudFormatPropagator',
# 'EXPORTER': 'opencensus.trace.exporters.print_exporter.PrintExporter', # This one works to print the Trace and Span with IDs and details in the logs.
}
OPENCENSUS_TRACE_PARAMS = {
'BLACKLIST_PATHS': ['/health'],
'GCP_EXPORTER_PROJECT': 'my_project_number', # Should this be None like the example, or Project ID, or Project Number?
'SAMPLING_RATE': 0.5,
'SERVICE_NAME': 'my_service', # Not sure if this is my app name or some other service name.
'ZIPKIN_EXPORTER_HOST_NAME': 'localhost', # Are the following even necessary, or are they causing a failure that is not detected by Apache2?
'ZIPKIN_EXPORTER_PORT': 9411,
'ZIPKIN_EXPORTER_PROTOCOL': 'http',
'JAEGER_EXPORTER_HOST_NAME': None,
'JAEGER_EXPORTER_PORT': None,
'JAEGER_EXPORTER_AGENT_HOST_NAME': 'localhost',
'JAEGER_EXPORTER_AGENT_PORT': 6831
}
Here's an example (I prettified the format for readability) of the Apache2 log when it is set to use the PrintExporter
:
[Fri Feb 08 09:00:32.427575 2019]
[wsgi:error]
[pid 1097:tid 139801302882048]
[client 10.48.0.1:43988]
[SpanData(
name='services.views.my_view',
context=SpanContext(
trace_id=e882f23e49e34fc09df621867d753532,
span_id=None,
trace_options=TraceOptions(enabled=True),
tracestate=None
),
span_id='bcbe7b96906a482a',
parent_span_id=None,
attributes={
'http.status_code': '200',
'http.method': 'GET',
'http.url': '/',
'django.user.name': ''
},
start_time='2019-02-08T17:00:29.845733Z',
end_time='2019-02-08T17:00:32.427455Z',
child_span_count=0,
stack_trace=None,
time_events=[],
links=[],
status=None,
same_process_as_parent_span=None,
span_kind=1
)]
Thanks in advance for any tips, assistance, or troubleshooting advice!
Edit 2019-02-08 6:56 PM UTC:
I found this in the middleware:
# Initialize the exporter
transport = convert_to_import(settings.params.get(TRANSPORT))
if self._exporter.__name__ == 'GoogleCloudExporter':
_project_id = settings.params.get(GCP_EXPORTER_PROJECT, None)
self.exporter = self._exporter(
project_id=_project_id,
transport=transport)
elif self._exporter.__name__ == 'ZipkinExporter':
_service_name = self._get_service_name(settings.params)
_zipkin_host_name = settings.params.get(
ZIPKIN_EXPORTER_HOST_NAME, 'localhost')
_zipkin_port = settings.params.get(
ZIPKIN_EXPORTER_PORT, 9411)
_zipkin_protocol = settings.params.get(
ZIPKIN_EXPORTER_PROTOCOL, 'http')
self.exporter = self._exporter(
service_name=_service_name,
host_name=_zipkin_host_name,
port=_zipkin_port,
protocol=_zipkin_protocol,
transport=transport)
elif self._exporter.__name__ == 'TraceExporter':
_service_name = self._get_service_name(settings.params)
_endpoint = settings.params.get(
OCAGENT_TRACE_EXPORTER_ENDPOINT, None)
self.exporter = self._exporter(
service_name=_service_name,
endpoint=_endpoint,
transport=transport)
elif self._exporter.__name__ == 'JaegerExporter':
_service_name = self._get_service_name(settings.params)
self.exporter = self._exporter(
service_name=_service_name,
transport=transport)
else:
self.exporter = self._exporter(transport=transport)
The exporter is now named StackdriverExporter
, instead of GoogleCloudExporter
. I set up a class in my app named GoogleCloudExporter
that inherits StackdriverExporter
, and updated my settings.py to use GoogleCloudExporter
, but it didn't seem to work, I wonder if there is other code referencing these old naming schemes, possibly for the transport. I'm searching the source code for clues... This at least tells me I can get rid of the ZIPKIN and JAEGER param options, as this is determined on the EXPORTER
param.
Edit 2019-02-08 11:58 PM UTC:
I scrapped Apache2 to isolate the problem and just set my docker image to use Django's built in webserver CMD ["python", "/path/to/manage.py", "runserver", "0.0.0.0:80"]
and it works! When I go to the site, it writes traces to Stackdriver Trace for each request, the Span name is the module and method being executed.
Somehow Apache2 is not being allowed to send these, but I can do so from the shell when running as root. I'm adding Apache2 and mod-wsgi tags to the question, because I have a funny feeling this has to do with forking child processes in Apache2 and mod-WSGI. Would it be the child process being unable to be created as apache2's child process is sandboxed, or could this be a permissions thing? It seems strange, because it is just calling python modules, no external system OS binaries, that I am aware of. Any other ideas would be greatly appreciated!