We are working on incorporating Grafana, Jaeger, Prometheus and other tools into our production environments.
In this process, we have set up graphs for various services/operations, and we would like to be able to go directly from a "Spike" in the graph to the trace, either in the Grafana Explorer or Jaeger itself, but here It seems like things do not quite match up.
We have managed to get data links to work by first transforming the data fields (renaming them "Organize fields") otherwise, we could not get the link templates to work. The Trace ID has then been renamed to TraceId.
Then we add a Data Link to the graph, currently locally, with the following template:
http://localhost:16686/trace/${__data.fields['TraceId']}
So to try to summarize - Given a really simple graph:
- Type:
Time Series
- Data Source:
Jaeger
- Query:
Service=<Our Service>, Operation=All, Tags="category=REQUEST"
- Limit:
2500
Transform:
- Organize Fields:
Trace ID -> TraceId
Data Link:
- Jaeger:
http://localhost:16686/trace/${__data.fields['TraceId']}
Now, this Appears to work at first. However, our problem is that the ID's does not match up correctly. So when we have e.g. a 10second spike and click that in the graph, get the context menu and then click the link above, we go to a completely different trace.
I have tried to zoom in and see that when the data links appear in the tooltip, the duration matches the trace I want to see.
Then I finally filtered something down to only having 4 points (traces) in the graph, and I discovered that it appears like the links are in reverse compared to the graph.
So with 4 data points in the graph like so: ( Id=What Id seems to be the right match in Jaeger taken the Time and Duration )
2022-08-04T08:36:57.402, 3.91 ms, ( Id=ea801f08b37f64cd374fbc28f52e38f6 )
2022-08-04T08:36:57.403, 2.90 ms, ( Id=eecf656be29f783d5c0ecae1113827d8 )
2022-08-04T08:36:57.409, 4.43 ms, ( Id=06da4ae77a124e6eb4eca3a63a65e115 )
2022-08-04T08:36:57.416, 3.06 ms, ( Id=3ddb2e5b37309fec68a163d35fc929e8 )
Yet the links fall like this instead:
2022-08-04T08:36:57.402, 3.91 ms, Link=http://localhost:16686/trace/3ddb2e5b37309fec68a163d35fc929e8
2022-08-04T08:36:57.403, 2.90 ms, Link=http://localhost:16686/trace/06da4ae77a124e6eb4eca3a63a65e115
2022-08-04T08:36:57.409, 4.43 ms, Link=http://localhost:16686/trace/eecf656be29f783d5c0ecae1113827d8
2022-08-04T08:36:57.416, 3.06 ms, Link=http://localhost:16686/trace/ea801f08b37f64cd374fbc28f52e38f6
This seems to be consistent, so does anyone have any clue for us to what we are doing wrong? Feel free to ask for specific details. I am not sure what to share.