1

I've inherited a nodejs application with reasonable complexity. It interacts with 5 other applications through various network interfaces, mostly HTTP REST APIs. Every now and then I run into an issue where an error is thrown, and while its handled to prevent the application from crashing, I can't tell where the error is coming from. Here's a typical amount of information I get from this error:

And that's it. That's the entire message and stacktrace (message is actually duplicated in errno=ECONNREFUSED and syscall=connect, not helping). I'm using nodejs 0.12.2 and linked the stacktrace lines to the source accordingly. I've read the source but didn't get anywhere.

I've also looked through the many questions here on SO related to ECONNREFUSED, but those always come with a code example. If I knew what part of my applicatino is response for the network request, I could fix it.

So my question is, how can I instrument a node application to find out where failed requests like this come from?

PS: I've also looked at the recommendations for debugging nodejs applications, but didn't find anything answering my question.

Community
  • 1
  • 1
Jörn Zaefferer
  • 5,665
  • 3
  • 30
  • 34
  • From the [POSIX `connect` system call reference](http://pubs.opengroup.org/onlinepubs/9699919799/functions/connect.html) for `ECONNREFUSED`: "The target address was not listening for connections or refused the connection request." Most likely reason? There was nobody listening for connections on the address/port you tried to connect to. If there *is* someone listening, then have you checked your firewalls? – Some programmer dude Jun 29 '15 at 16:19
  • 1
    Before looking at the code, why don't you begin with tcpdump (or Wireshark) and Fiddler in order to determine which server and service is refusing the connection? With tcpdump (or Wireshark) you can find out IP and port that is refusing it. With Fiddler, if it is a request that files once in a while, you can have an idea of which one is failing (by looking at previous requests) and you can see requests over SSL. That will give you a first idea. – rodolk Jun 29 '15 at 19:24
  • @rodolk thanks, that's a good idea, but can also be impractical if the issue only happens on a system where I can't run these tools. – Jörn Zaefferer Jul 07 '15 at 12:10

2 Answers2

1

The following snippet logs errors for all http connections, independent of their call site:

// install error handler on all sockets to provide context for ECONNREFUSED and smilar errors
// this is not an official API and may break at some point
// it will also likely log this error multiple times, which the socketErrorId helps identify
var http = require('http');
var net = require('net');
var errorCounter = 0;
http.globalAgent.createConnection = function (options) {
  var socket = net.createConnection(options)
  socket.on('error', function (error) {
    errorCounter += 1;
    error.socketErrorId = errorCounter;
    console.log('socket error, while connecting to ', options.href, error);
  })
  return socket
}

I've had this running in production for a week now and it helped identify connection issues that were otherwise impossible to pin down.

The options object has more properties than just href, though that's the one that was most useful for me. Here's a full list from a test I did: domain, _events, _maxListeners, callback, uri, headers, method, readable, writable, explicitMethod, _qs, _auth, _oauth, _multipart, _redirect, _tunnel, setHeader, hasHeader, getHeader, removeHeader, localAddress, pool, dests, __isRequestRequest, _callback, proxy, tunnel, setHost, originalCookieHeader, _disableCookies, _jar, port, host, path, httpModule, agentClass, agent, _started, href, servername, encoding

The snippet above is based on a gist someone wrote for me after seeing my lamentions on Twitter.

Jörn Zaefferer
  • 5,665
  • 3
  • 30
  • 34
0

The error is caused by a TCP connection coming back from the kernel as refused by the host - you need to do some error handling and retry attempts to avoid those problems. Try cUrling the URL to make sure it's not something that is wrong with your machine.

You can use the domain module to encapsulate code and make sure to preserve the context in which the error took place. In addition, you should always make named functions for callbacks and the like because that will at least point you in the right direction.

There are also libraries out there that can attempt to retry HTTP requests and the like if that's what you need.

theWanderer4865
  • 861
  • 13
  • 20
  • 1
    I could do all these things if I knew what code in my app to look at. But my question is about finding that code. Naming callbacks wouldn't help since my callback is never invoked, as you can see in the stacktrace I put into the question. – Jörn Zaefferer Jun 29 '15 at 16:26
  • Adding the longjohn module just causes my app to crash immediately, it looks like it turns a handled exception into an unhandled exception. – Jörn Zaefferer Jun 29 '15 at 16:35
  • If you use node-inspector you can place break points at the locations of calls to external resources (which is what this looks like) and see which one's are giving you trouble. You could also put `console.log`s in around the areas where those calls are made. Those are really the only things that I would know to do in this situation other than running unit tests to look for points of failure in coverage (using something like istanbul for coverage). – theWanderer4865 Jun 29 '15 at 19:48