4

I've got a node.js server that is freezing in production, and it appears to be caused by an infinite loop inside of JSONStream. Here's a stack trace captured from a core dump from a frozen server:

1: toString [buffer.js:~392] (this=0x1e28fb6d25c9 <a Buffer>#1#,encoding=0x266ee104121 <undefined>,start=0x266ee104121 <undefined>,end=0x266ee104121 <undefined>)
2: arguments adaptor frame: 0->3
3: write [/home/deploy/node_modules/JSONStream/node_modules/jsonparse/jsonparse.js:136] (this=0x32cc8dd5a999 <a Parser>#2#,buffer=0x32cc8dd5aa49 <a Buffer>#3#)
4: /* anonymous */ [/home/deploy/node_modules/JSONStream/index.js:~17] (this=0x32cc8dd5ab11 <a Stream>#4#,chunk=0x32cc8dd5aa49 <a Buffer>#3#)
5: write [/home/deploy/node_modules/JSONStream/node_modules/through/index.js:~24] (this=0x32cc8dd5ab11 <a Stream>#4#,data=0x32cc8dd5aa49 <a Buffer>#3#)
6: write [_stream_readable.js:~582] (this=0x266ee106c91 <JS Global Object>#5#,dest=0x32cc8dd5ab11 <a Stream>#4#,i=0,list=0x266ee104101 <null>)
7: flow [_stream_readable.js:592] (this=0x266ee106c91 <JS Global Object>#5#,src=0x32cc8dd5ac69 <an IncomingMessage>#6#)
8: /* anonymous */ [_stream_readable.js:560] (this=0x266ee106c91 <JS Global Object>#5#)
9: _tickCallback [node.js:415] (this=0x29e7331bb2a1 <a process>#7#)

How can I find the source of this infinite loop?

Unfortunately the servers are running in production and are processing many thousands of requests, so it's difficult to give any additional context. The basic function of the servers is to make outbound HTTP requests for other services.

It's worth noting that I don't believe this is caused by a memory leak. The server's memory usage remains constant (and low) during these freeze events, while the CPU spikes to 99%

Another piece of evidence towards the conclusion of an infinite loop is that the event loop itself seems to have stopped. When I put a console.log inside of a setInterval, the server will stop outputting as soon as it freezes.

We have verified that the problem is not caused by expired/corrupted socket connections, by setting the number of max connections to Infinity (which disables their reuse in node.js)

We're using JSONStream 0.7.1 (which includes a default jsonparse version of 0.0.5). We found this issue on the JSONStream repo, and tried forking JSONParse and only updating to the newest jsonparse version. It did NOT fix the issue.

ZECTBynmo
  • 3,197
  • 3
  • 25
  • 42
  • The first message says `toString [buffer.js:~392]`, the problem is at buffer.js line 392 – Jonathan Solorzano Oct 01 '15 at 02:29
  • What could cause an infinite loop though @JonathanS? That source looks pretty innocuous – ZECTBynmo Oct 06 '15 at 22:10
  • Do you have the JSON string itself that's getting serialized to the stream? – Vinay Oct 06 '15 at 23:04
  • Unfortunately no. The servers are running in production, and they have thousands of events per second going through them, and we haven't found any pattern to the failures, so there's no additional information I can give. – ZECTBynmo Oct 06 '15 at 23:07
  • I would say that it might have some memory leak.. check this blog: http://www.toptal.com/nodejs/debugging-memory-leaks-node-js-applications – fmodos Oct 06 '15 at 23:20
  • As a tip in the right direction for future debugging purposes: our servers are very configuration-oriented. We listen to changes on the configuration file and the server re-digests it. This allows us to have a config field that makes a debug log function turn from a no-op to actually logging. That way we can have a bunch of `logger.debug` calls in our production code. If you don't care about the "liveness" of this debugging approach, you can have a precompilation step that strips out the debug calls normally unless you're deploying a release candidate to help you debug an issue like this one. – Vinay Oct 06 '15 at 23:22
  • @fmodos Thanks for the idea! I should have mentioned that it does not appear to be a memory leak, as the memory usage remains constant (and low) during the infinite loop. CPU spikes to 99%. I'll edit the OP to reflect this – ZECTBynmo Oct 07 '15 at 00:13
  • @Vinay we do have extensive logging setup, but unfortunately there's no identifiable pattern that we've noticed yet - request errors are nothing abnormal for us, so it's kind of a needle in a haystack – ZECTBynmo Oct 07 '15 at 00:14
  • Are there to many clients writing to the stream? Do you have any sort of load balancing in place to spawn new stream handlers as they are needed? I'm just throwing the idea out there, you could test this by checking if your problem isn't replicated in a test environment with a low frequency of requests, and then simulate more requests to see if the problem is reproduced. I'd suspect some sort of memory leak or blocking is happening here. – Halfpint Oct 07 '15 at 00:27
  • @Alex thanks for the idea, but I'm fairly certain that the server is in some kind of infinite loop. I'm not sure what you mean by "many clients writing to the stream". – ZECTBynmo Oct 07 '15 at 02:32
  • We have verified that the problem is not caused by expired/corrupted socket connections, by setting the number of max connections to Infinity (which disables their reuse in node.js) @Alex – ZECTBynmo Oct 07 '15 at 02:33
  • Corrupted in which way? How did the corrupted data leak from the socket connection to where you stream this JSON object? At the very least, have you put in additional logging that will make this obvious in the future if it arises again? Distributed systems nerd here. :-) – Vinay Oct 07 '15 at 02:45
  • The problem is NOT corrupted sockets. I'm just trying to eliminate causes, and I think that's what @Alex was getting at. That was actually one of the first things we considered, even before getting the core dump info. – ZECTBynmo Oct 07 '15 at 02:46
  • I'm glad you ruled that out as an issue, I hope you find the root of the problem soon. – Halfpint Oct 07 '15 at 12:28
  • I initially thought that this might be an issue where you have a circular reference in your JSON structure, but it would appear that JSON.stringify is smart enough to `throw` its toys out of the pram if you do this. Is there any chance you could at least share a section of the code? Otherwise it's going to be much hmming and ahhing. In particular, it would be good to see the flow of what objects are being piped into the stream. I know you mention that its in production and you cant stop it, that's fine, but being able to see how we get to the output stream might help. – Dan Oct 07 '15 at 15:08
  • Also, I doubt this is a memory leak; this seems to be an issue with some kind of recursive structure, which would explain why all of your CPU gets consumed. – Dan Oct 07 '15 at 15:10
  • I would GREATLY help to know the versions of the packages you are using as well. Until that point there's really not much anyone can do to help. `jsonstream` depends on `jsonparse` and the error occurs at `jsonparse.js:136`, but the current HEAD of `jsonparse.js:136` is a line that couldn't throw an error, so versioning is really important to diagnose this. – Dan Oct 07 '15 at 15:12
  • However, ultimately, there just isn't enough information in this error to diagnose your problem - which is probably why no one has responded. We need to at least have your `package.json` `dependencies`. – Dan Oct 07 '15 at 15:16
  • @ARedHerring Thanks for the contribution. I'm also suspecting some kind of recursive structure, but it does appear that the crash is happening during toString, rather than Stringify, so I'm not sure how it would be possible. I'll update the OP with some information about package versions – ZECTBynmo Oct 07 '15 at 15:17
  • @ARedHerring Unfortunately there's no easy way to paste a code snippit - the streaming behavior of the servers is rather complicated, and spread across many files. – ZECTBynmo Oct 07 '15 at 15:21
  • 1
    That's fine. Your issue appears to be caused [here](https://github.com/creationix/jsonparse/blob/0.0.5/jsonparse.js#L136). – Dan Oct 07 '15 at 15:22
  • @ARedHerring Interesting!!! What leads you to that conclusion? – ZECTBynmo Oct 07 '15 at 15:22
  • 1
    Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/91625/discussion-between-a-red-herring-and-zectbynmo). – Dan Oct 07 '15 at 15:23
  • @ZECTBynmo did you consider the point that you are processing too much simultaneous requests and this might be the real cause of the CPU usage? got remember that Node.js creates a new process when you are forwarding the request. – fmodos Oct 07 '15 at 19:04
  • @fmodos this is actually confirmed due to an infinite loop in jsonparse's code - nothing to do with too many simultaneous requests. The problem here is the actual event loop is stopping, which is indicative of CPU bound issues, not I/O bound ones. – Dan Oct 07 '15 at 19:27

1 Answers1

3

It would appear that your issue is somehow being caused by this line of jsonstream@0.0.5. While I can't conclusively fix the issue for you without a dump of what is going through the server, this would appear to indicate that your buffer is extremely large.

This would also explain why your servers are locking up (as you mentioned in chat), why the event loop doesn't proceed and why your memory doesn't go up to the sky but your CPU does; what's probably happen here is you're attempting to toString() an incredibly large amount of bytes which your hardware simply can't do and it dies.

By all means report back on any further investigation from this lead. I feel it's unavoidable that you're going to have repro the issue on a dev box. It could be as simple as adding some sanity checks to your buffer and making sure it doesn't go above a certain size.

The above section of code is only hit if n <= 128, so if you're using signed characters (such as Unicode) or signed bytes then you might actually encounter this issue in normal function... might be something to consider!

Dan
  • 10,282
  • 2
  • 37
  • 64
  • 1
    Thanks again! I've managed to verify that we are indeed looping at that line. The direct reason for the infinite loop is that this.bytes_in_sequence is 0, so we never increment the counter i, so we're stuck. I haven't figured out the root cause of that 0, but...progress!! – ZECTBynmo Oct 07 '15 at 16:25
  • 2
    That seems like a strange bug. The section I linked is only reached if `bytes_in_sequence <= 0` and it will actually decrease `i` by 1 - that is what causes the infinite loop. You *may* want to check if this is logged as an issue on jsonparse's github. Are you *positive* that you tried a newer version of jsonparse? I mean, this version of jsonparse is 2 years old. – Dan Oct 07 '15 at 16:56
  • 1
    In the end, there were two separate but related issues. First, the version of JSONStream we were using included jsonparse@0.0.5, which was before the fix mentioned in the OP. Once we applied that fix, the server was still freezing, because the fix had addressed ASCII characters between 128 and 194, but not characters above 244. tldr: a specific ascii character was causing an infinite loop inside of jsonparse – ZECTBynmo Oct 08 '15 at 04:30
  • 1
    What a very intriguing bug :P – Dan Oct 08 '15 at 05:18
  • @ZECTBynmo seems [we're not the first ones](https://github.com/creationix/jsonparse/issues/17) to have this issue. Just no one opened a PR. – Dan Oct 08 '15 at 06:22
  • Yeah, but I believe we are the first ones to see the issue for characters above 244. I wonder what caused our system to encounter these weird characters? My guess is that some third party system (we talk to a lot of them) must have changed their API to include those characters as unescaped content. – ZECTBynmo Oct 08 '15 at 14:09
  • Full resolution to the issue thanks in large part to @ARedHerring - thanks again! – ZECTBynmo Oct 10 '15 at 01:33