6

I have a large, messy JS codebase. Sometimes, when the app is being used, a variable is set to NaN. Because x = 2 + NaN results in x being set to NaN, the NaN it spreads virally. At some point, after it has spread pretty far, the user notices that there are NaNs all over the place and shit generally doesn't work anymore. From this state, it is very difficult for me to backtrack and identify the source of the NaN (and there could very well be multiple sources).

The NaN bug is also not easily reproducible. Despite hundreds of people observing it and reporting it to me, nobody can tell me a set of steps that lead to the appearance of NaNs. Maybe it is a rare race condition or something. But it's definitely rare and of uncertain origins.

How can I fix this bug? Any ideas?

Two stupid ideas I've thought of, which may not be feasible:

  1. Write some kind of pre-processor that inserts isNaN checks before every time any variable is used and logs the first occurrence of NaN. I don't think this has been done before and I don't know how hard it would be. Any advice would be appreciated.

  2. Run my code in a JS engine that has the ability to set a breakpoint any time any variable is set to NaN. I don't think anything does this out of the box, but how hard would it be to add it to Firefox or Chrome?

I feel like I must not be the first person to have this type of problem, but I can't find anyone else talking about it.

dumbmatter
  • 9,351
  • 7
  • 41
  • 80
  • How's the code organized? Is it a big modular JS program? or structured OOP? or any in between? – carlodurso Oct 25 '14 at 14:31
  • It's organized fairly well into modules using RequireJS. – dumbmatter Oct 25 '14 at 14:43
  • Chrome's debugger allows to set conditional breakpoints. – Cheery Oct 27 '14 at 22:01
  • if you do, what you described in 1), you could also put `debugger;` in front of it. Or `if (isNaN(x)) {debugger;}` - which would work at least in chrome, possibly other browsers as well. This is essentially a conditional breakpoint. – amenthes Oct 27 '14 at 22:44
  • @Cheery does it let me do something like "break when any variable is set to NaN"? – dumbmatter Oct 27 '14 at 22:52
  • @dumbmatter any - no, but in the most called locations you can setup such a condition. – Cheery Oct 27 '14 at 22:53
  • @Cheery That does not satisfy my curiosity and my hope to more elegantly solve this problem :) – dumbmatter Oct 27 '14 at 23:00
  • Maybe this can help you: http://stackoverflow.com/a/2762091/3225104 – Hristo Nov 01 '14 at 22:08
  • 1
    Since you are in Javascript land: "... the expression (x != x) is a more reliable way to test whether variable x is NaN or not ..." ~ [MDN on isNaN](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/isNaN). – musically_ut Nov 03 '14 at 14:35

7 Answers7

2

There is probably no solution for your problem aka: break, whenever any variable is set to NaN. Instead, you could try to observe your variables like this:

  • It was earlier stated, that the Chrome debugger offers conditional breakpoints. But, it also supports to watch expressions. In the Watch-Expressions menu you can set a condition to break, whenever the variable is set to a specific value.

  • Object.observe is a method that observes changes on a object. You are able to listen to all changes on the object, and call debug when any variable is set to NaN. For example, you could observe all change on the window object. Whenever any variable on the window object is set to NaN, you call debug. Please note, that Object.observe is quite cutting edge and not supported by all browsers (check out the polyfill in this case).

  • Take this opportunity to write a test case for every function in your code. Perform random testing and find the line of code that can create NaN values.

Another problem of yours is probably how to reproduce this error. Reloading your webpage over and over doesn't make too much sense. You could check out a so called headless browser: It starts an instance of a browser without displaying it. It can be leveraged to perform automatic tests on the website, click some buttons, do some stuff. Maybe you can script it in such a way that it finally reproduces your error. This has the advantage that you don't have to reload your webpage hundreds of times. There are several implementations for headless browsers. PhantomJS is really nice, in my opinion. You can also start a Chrome Debug Console with it (you need some plugin: remote debugger).

Furthermore, please notice that NaN is never equal to NaN. It would be a pity if you finally are able to reproduce the error, but your breakpoints don't work.

Community
  • 1
  • 1
dmonad
  • 618
  • 5
  • 10
  • I hadn't seen JSCheck before, thanks for the link. Would that be sophisticated enough to detect when a NaN could be caused by a race condition? As an example, maybe my bug is caused by something like... "A user initiates an action which writes two values in two transactions to two different IndexedDB object stores. After the first finishes but before the second finishes, he loads the same page in a new window. The new instance sees the first value and expects to find the second, but instead it comes back undefined and turns into NaN when it's added to another variable." – dumbmatter Oct 29 '14 at 13:55
  • "There is probably no solution for your problem aka: break, whenever any variable is set to NaN." - why not? I'm imagining a parser that identifies every single variable assignment and adds an `isNaN` line immediately after. If it's true, log the error. Maybe it would slow things down a little to have `isNaN`s all over the place, but if I put it live for like a week I might be able to find the root cause of the problem and fix it once and for all... – dumbmatter Oct 29 '14 at 13:57
  • Ok, I should rephrase: There is probably no existing solution for this. Your solution would probably work, but writing such a program could be really hard to implement correctly. Though, I agree, it is possible. When it comes to JSCheck, it is something like a sophisticated way to create random parameters for functions. It is not _that_ powerful. – dmonad Oct 29 '14 at 15:26
  • 7
    In my version of Chrome, you can't use `Object.observe` on the `window` object, it gives a `TypeError` saying "observe cannot be called on the global proxy object" – Jake Cobb Oct 30 '14 at 20:26
1

Are your code communicate with your server side, or it is only client side? You mention that it is rare problem, therfore it may happend only in some browsers (or browsers version) or on any situation which may be hard to reproduce. If we assume that any appearance of nan is problem, and that when it happend user notice bug ("there are NaNs all over the place"), then instead display popup with error, error should contain first occurence of nan (then users may raport it "Despite hundreds of people observing it and reporting it to me"). Or not show it, but send it to server. To do that write simple function which take as agument only one variable and check if variable is NaN,. Put it in your code in sensitive places (sensitive variables). And this raports maybe solate problematic code. I know that this is very dirty, but it can help.

Krzysztof Sztompka
  • 7,066
  • 4
  • 33
  • 48
  • It's a purely client-side app. The bug has been observed in Firefox and Chrome, the only two browsers with any significant userbase. "Put it in your code in sensitive places" is the tricky part, I was hoping for some systematic way to do that. If that's not possible, I might have to do what you suggest, but it would be a slow, painful process. Especially since there could very well be multiple places NaNs are introduced, I want to be able to confidently say I fixed all of them. – dumbmatter Oct 27 '14 at 22:49
  • yes it is pain and dirty way. But other solutions are base on reproduce bug in your tool (if you find this bug in chrome debuger, it does not mean that your clients browsers not have more/other bugs). How many times you reproduce this error? The problem may be not code, but user data, and your debug tool when you use it may not find problem. What your app do? Is it communicate server side on start, or in mamy places? Is user fill some forms, and your app works on data from it? – Krzysztof Sztompka Oct 28 '14 at 07:39
  • Me personally, I've seen it only a few times. Very rare. I have thousands of users and many of them have seen it, but only rarely. There is hardly any user data input in my app (it's a video game, almost all data is randomly generated, there is no server side component). I suspect the ultimate source of the problem is a race condition, due to how rare and intermittent the error is. I don't want to manually check every variable in 20 different files - but if there was an automated way to parse my JS and insert some checking code in front of all variable assignment, that would be cool. – dumbmatter Oct 28 '14 at 16:50
1

One of your math functions is failing. I have used Number(variable) to correct this problem before. Here is an example:

test3 = Number(test2+test1) even if test1 and test2 appear to be numbers

chran
  • 119
  • 8
  • That wouldn't help. `Number('a' + 4)` is `NaN`. I don't want to have `NaN` where I should have a number, that's the problem I have now. Besides, my goal isn't to cover up the error, it's to identify the source of it. – dumbmatter Oct 28 '14 at 16:48
1

Yeah man race conditions can be a pain, sounds like what it may be.

Debugging to the source is definitely going to be the way to go with this.

My suggestion would be to setup some functional testing with a focus on where these have been reproduced, set some test conditions with varied timeouts or such and just rerun it until it catches it. Set up some logging process to see that backtrace if possible.

What does your stack look like? I can't give too much analysis without looking at your code but since its javascript you should be able to make use of the browser's dev tools I assume?

mattLummus
  • 565
  • 4
  • 12
1

If you're doing a good job keeping things off of the global namespace and nesting things in objects, this might be of help. And I will preface this by saying this is by no means a fully complete solution, but at the very least, this should help you on your search.

function deepNaNWatch(objectToWatch) {
  'use strict';

  // Setting this to true will check object literals for NaN
  // For example: obj.example = { myVar : NaN };
  // This will, however, cost even more performance
  var configCheckObjectLiterals = true;

  var observeAllChildren = function observeAllChildren(parentObject) {

    for (var key in parentObject) {
      if (parentObject.hasOwnProperty(key)) {
        var childObject = parentObject[key];

        examineObject(childObject);
      }
    }
  };

  var examineObject = function examineObject(obj) {
    var objectType = typeof obj;

    if (objectType === 'object' || objectType === 'function') {
      Object.observe(obj, recursiveWatcher);
      if (configCheckObjectLiterals) {
        observeAllChildren(obj);
      }
    } if (objectType === 'number' && isNaN(obj)) {
      console.log('A wild NaN appears!');
    }
  };

  var recursiveWatcher = function recursiveWatcher(changes) {
    var changeInfo = changes[0];
    var changedObject = changeInfo.object[changeInfo.name];

    examineObject(changedObject);
  };

  Object.observe(objectToWatch, recursiveWatcher);
}

Call deepNaNWatch(parentObject) for every top level object/function you're using to nest things under as soon as they are created. Any time an object or function is created within a watched object/function, it itself will become watched as well. Any time a number is created or changed under a watched object--remember that typeof NaN == 'number'--it will check if it's NaN, and if so will run the code at console.log('A wild NaN appears!');. Be sure to change that to whatever sort of debugging output you feel will help.

This function would be more helpful if someone could find a way to force it onto the global object, but every attempt I made to do so simply told me I should sit in time out and think about what I've done.

Oh, and if it's not obvious from the above, on a large scale project, this function is bound to make pesky features like "speed" and "efficiency" a thing of the past.

  • 1
    This would catch a lot but it can still miss things, e.g. in object literals: `var o = {}; deepNaNWatch(o); o.o = {n: NaN};` won't be caught, but a later `o.o.n = NaN` will be. – Jake Cobb Oct 30 '14 at 20:23
  • @JakeCobb True; I've revised the code to account for this. It should now be able to catch instances where an object is created with an object literal. It's inevitable that this will cost even more performance, unfortunately, but such is the cost of attempting to create an omniscient function. – KoratDragonDen Oct 30 '14 at 21:22
1

If you know locations where the NaNs propagate to, you could try to use program slicing to narrow down the other program statements that influence that value (through control and data dependences). These tools are usually non-trivial to set up, however, so I would try the Object.observe-style answers others are giving first.

You might try WALA from IBM. It's written in Java, but has a Javascript frontend. You can find information on slicer on the wiki.

Basically, if the tool is working you will give it a program point (statement) and it will give you a set of statements that the starting point is (transitively) control- and/or data-dependent on. If you know multiple "infected" points and suspect a single source, you could use the intersection of their slices to narrow down the list (the slice of a program point can often be a very large set of statements).

Jake Cobb
  • 1,811
  • 14
  • 27
1

(was too long for a comment)

While testing you could overwrite ALL Math functions to check if an NaN is being produced.

This will not catch

a = 'string' + 1;

but will catch things like

a = Math.cos('string');
a = Math.cos(Infinity);
a = Math.sqrt(-1);
a = Math.max(NaN, 1);
...

Example:

for(var n Object.getOwnPropertyNames(Math)){
    if (typeof Math[n] === 'function') Math[n] = wrap(Math[n]);
}
function wrap(fn){
    return function(){
        var res = fn.apply(this, arguments);
        if (isNaN(res)) throw new Error('NaN found!')/*or debugger*/;
        return res;
    };
}

I didn't tested, maybe an explicit list of the "wrap"ed methods is better.

BTW, you should not put this into production code.

Prusse
  • 4,287
  • 2
  • 24
  • 29