7

My current setup involves a Node.js web application using Express.js.
I am using DataDog's dd-tracer to measure the time Node.js spends for particular method invocations as part of my APM solution.

I would like to know if it is possible to measure the portion of time an incoming HTTP request is busy sending data back to the client as HTTP response body.

Are there any pitfalls or inaccuracies involved when trying to do this kind of instrumentation?
Does anybody know why this is not measured by APM client libraries by default?

Martin Löper
  • 6,471
  • 1
  • 16
  • 40

4 Answers4

4

I would like to know if it is possible to measure the portion of time an incoming HTTP request is busy sending data back to the client as HTTP response body.

You could wrap calls to res.write manually to create additional spans in the request trace. I would only recommend this if there are not many calls to the method within a request, and otherwise I would recommend to capture just a metric instead.

Alternatively, profiling might be an option that would give you a lot more information about exactly what is taking time within the res.write calls.

I look for a "global" solution which can be integrated into a Nest.js application without instrumenting each call to res.write manually.

As described above, you can simply wrap res.write directly at the start of every request. Using the tracer, this can be achieved like this:

res.write = tracer.wrap('http.write', res.write)

This should be done before any other middleware has the chance to write data.

Example middleware:

app.use((req, res) => {
  res.write = tracer.wrap('http.write', res.write)
})

Are there any pitfalls or inaccuracies involved when trying to do this kind of instrumentation?

Nothing major that I can think of.

Does anybody know why this is not measured by APM client libraries by default?

The main issue for doing this out of the box is that creating a span for every call to res.write may be expensive if there are too many calls. If you think it would make sense to have an option to do this out of the box, we can definitely consider adding that.

Hope this helps!

rochdev
  • 360
  • 1
  • 5
  • Although I am not sure if this approach will work well in production since you mentioned that it is expensive to create a bunch of spans and @Daniele Ricci pointed out some sources of possible inaccuracies, I guess this solution fits my use case the most. I will try it out and post an update here if it worked or not. – Martin Löper Jun 06 '20 at 03:34
4

It depends if you want to have the response time for each of the calls or if you want to gather statistics about the response time.

For the first, to get the response time in the header of the response for each request, you can use response-time package: https://github.com/expressjs/response-time

This will add to the response header a value (by default X-Response-Time). That will have the the elapsed time from when a request enters the middleware to when the headers are written out.

var express = require('express')
var responseTime = require('response-time')

var app = express()

app.use(responseTime())

app.get('/', function (req, res) {
  res.send('hello, world!')
})
  • If you want a more complete solution and gather statistics that include the response time you can use the

express-node-metrics package

https://www.npmjs.com/package/express-node-metrics

var metricsMiddleware = require('express-node-metrics').middleware;
app.use(metricsMiddleware);

app.get('/users', function(req, res, next) {
    //Do Something
})
app.listen(3000);

You can expose and access this statistics like this:

'use strict'
var express = require("express");
var router = express.Router();
var metrics = require('express-node-metrics').metrics;

router.get('/', function (req, res) {
    res.send(metrics.getAll(req.query.reset));
});
router.get('/process', function (req, res) {
    res.send(metrics.processMetrics(req.query.reset));
});
router.get('/internal', function (req, res) {
    res.send(metrics.internalMetrics(req.query.reset));
});
router.get('/api', function (req, res) {
    res.send(metrics.apiMetrics(req.query.reset));
});
  • If you need only the metrics, this is definitely a good approach. If you want to expose the metrics similar to the above, I would probably recommend OpenMetrics instead (Prometheus format which is supported by Datadog) or using DogStatsD with a library like hot-shots. – rochdev Jun 05 '20 at 20:22
  • 1
    Thanks for mentioning the response-time approach! That might certainly work for most of the people. However, I am looking for a solution which measures the portion of time an express request is busy writing response data (i.e. headers and body) out to the client. That is not covered by any of the aforementioned packages if I understand them correctly. – Martin Löper Jun 06 '20 at 03:29
3

First of all I state that I don't know dd-tracer, but I can try to provide a way to get the requested time, then it's up to the developer to use it as needed.

The main inaccuracy coming to my mind is that every OS has its own TCP stack and writing on a TCP socket is a buffered operation: for response bodies smaller than OS TCP stack buffer we are probably going to measure a time close to 0; the result we have is moreover influenced by the Node.js event loop load. The larger the response body becomes the more the event loop load related time becomes negligible. So, if we want to measure the write time for all request only to have a single point, but we'll do our analysis only for long time requests, I think the measurement will be quite accurate.

Another possible source of inaccuracy is how the request handlers write their output: if a request handler writes part of the body, then performs a long time operation to compute last part of the body, then writes missing part of the body, the measured time is influenced by the long time computing operation; we should take care that all request handlers write headers and body all at once.

My solution proposal (which works only if the server do not implements keep alive) is to add a middleware like this.

app.use((req, res, next) => {
    let start;
    const { write } = res.socket;

    // Wrap only first write call
    // Do not use arrow function to get access to arguments
    res.socket.write = function() {
        // Immediately restore write property to not wrap next calls
        res.socket.write = write;

        // Take the start time
        start = new Date().getTime();

        // Actually call first write
        write.apply(res.socket, arguments);
    };

    res.socket.on("close", () => {
        // Take the elapsed time in result
        const result = new Date().getTime() - start;

        // Handle the result as needed
        console.log("elapsed", result);
    });

    next();
});

Hope this helps.

Daniele Ricci
  • 15,422
  • 1
  • 27
  • 55
  • These are some good considerations regarding accuracy. Thanks a lot! I guess the solution outlined in the snippet could pretty much work for me. However, it will take some time for me to implement this and measure in production. – Martin Löper Jun 06 '20 at 03:24
1

You can start a timer before res.end and then any code after res.end should run after it is finished so stop the timer after the res.end function. Don't quote me on that tho.

Hackermon
  • 78
  • 1
  • 7
  • The main issue with `res.end` is that only the last write would be captured. I would recommend instead wrapping `res.write` itself. Then every call to `res.write` would go through the wrapper. I'll provide a snippet later today. – rochdev Jun 03 '20 at 19:10
  • I've added an example as an edit to my original answer. – rochdev Jun 04 '20 at 13:36