Why doesn't Integer represent NaN in Java?

Question

When I write something like

double a = 0.0;
double b = 0.0;
double c = a/b;

The result is Double.NaN, but when I try the same for integers, it produces an ArithmeticException. So, why isn't there a Integer.NaN?

Because the floating point specification specifies a `NaN` value and a conforming implementation must behave the correct way handling in. The integer implementation does not and there is no way to explicitly say 'invalid value' [related](http://en.wikipedia.org/wiki/NaN#Integer_NaN) , [related2](http://stackoverflow.com/questions/3949457/can-an-integer-be-nan-in-c) — Benjamin Gruenbaum, Jul 12 '13 at 13:10
possible duplicate of [Why does division by zero with floating point (or double precision) numbers not throw java.lang.ArithmeticException: / by zero in Java](http://stackoverflow.com/questions/12954193/why-does-division-by-zero-with-floating-point-or-double-precision-numbers-not) — Lion, Jul 12 '13 at 13:11
Looking into your question history, I have tagged your question with Java as a programming language. If you're talking about another language, then please tag as such. — Lion, Jul 12 '13 at 13:26

score 9 · Answer 1 · answered Jul 14 '13 at 03:43

9

The answer has very little to do with Java. Infinity or undefined numbers are not a part of the integer set, so they are excluded from Integer, whereas floating point types represent real numbers as well as complex numbers, so to deal with these, NaN has been included with floating point types.

answered Jul 14 '13 at 03:43

Trilokeshwar Shonku

101
2

Trivia(l?): int i = (int)Float.NaN; // 0 – TWiStErRob Sep 09 '14 at 13:26

Stephen C · Answer 2 · 2013-07-13T00:14:36.763

6

For the same reason that there is no integer NaN in any other language.

Modern computers use 2's complement binary representation for integers, and that representation doesn't have a NaN value. (All values in the domain of the representation type represent definite integers.)

It follows that computer integer arithmetic hardware does not recognize any NaN representation.

In theory, someone could invent an alternative representation for integers that includes NaN (or INF, or some other exotic value). However, arithmetic using such a representation would not be supported by the hardware. While it would be possible to implement it in software, it would be prohibitively expensive¹... and undesirable in other respects too to include this support in the Java language.

^{1 - It is of course relative, but I'd anticipate that a software implementation of NaNs would be (at least) an order of magnitude slower than hardware. If you actually, really, needed this, then that would be acceptable. But the vast majority of integer arithmetic codes don't need this. In most cases throwing an exception for "divide by zero" is just fine, and an order of magnitude slow down in all integer arithmetic operations is ... not acceptable.}

By contrast:

the "unused" values in the representation space already exist
NaN and INF values are part of the IEE floating point standard, and
they are (typically) implemented by the native hardware implementation of floating point arithmetic

edited Jul 13 '13 at 00:14

answered Jul 12 '13 at 13:31

Stephen C

698,415
94
811
1,216

1

I disagree about "prohibitively expensive" - that's a relative term, it may be cheaper than the cost of coding around the issue. And undesirable - dunno, maybe. I would say that if the concept exists for double, it could be usaful for ints. It's only because the underlying implementation of int doesn't provide for it that it isn't done, which practically implies that since double's internal representation and range *does* allow for it that it's a good idea that needs a different implementation to exist. – Bohemian Jul 12 '13 at 23:45
1

@Bohemian - we are talking about it being prohibitively expensive to make NaN integers part of "the language". Obviously, if an application really needs NaNs, it it is not prohibitively expensive for them. But most don't. – Stephen C Jul 13 '13 at 00:12
Yes, I know what you meant. But that "expense" (apart from the effort to change the language, which doesn't affect us programmers) would ultimately translate into higher CPU usage when dealing with ints, and it is *that* cost (which may be converted to dollars) that I am comparing the "cost" of the extra functionality gained (again convertable to dollars). Whether it is "prohibitive" or not depends on those dollar cost. I actually doubt very much whether it would be "prohibitive" - we are talking about changing, or rather adding to, the underlying implementation, which should be left "hidden", – Bohemian Jul 13 '13 at 03:16
@Bohemian - That's only one of the costs. Others include increased space to represent integers (or reduction in the size of the effective value space), changes to code, more fodder for the "Java slow" crew, ... And besides, a 10% slowdown in all Java programs does amount to a very large dollar cost ... in buying more kit, bigger data centre power bills etc. We could be talking billions of dollars, world-wide. – Stephen C Jul 13 '13 at 06:07
But isn't penny pinching memory what caused the massive year 2000 bug? Surely you just say "if we need it, we need it". Anyway, I think it's fine the way it is - the implementation has bubbled up into the language, but I'm fine with that. I would be very interested to know what the global cost would be if (for example) an `int` required *5* bytes of memory instead of *4*. There could be a PhD thesis in that. – Bohemian Jul 13 '13 at 08:58
@Bohemian - An `int` type that required 5 bytes instead of 4 would either need to be padded to 8 byte (for word alignment) or your program would take a significant performance hit for fetching and storing non-aligned words. I don't think it is worth a PhD thesis to investigate this ... – Stephen C Jul 13 '13 at 12:44
The PhD was for the cost calculation of using more bytes for an int. 5 bytes. 8 bytes. whatever. The interesting part would be what knock on effects that would have. No need to get bogged down in details. – Bohemian Jul 13 '13 at 13:24
1

This answer overstates its case. The use of two’s complement for integer representations does not preclude setting aside one of the values for use as a NaN. Taking the extreme negative value for this use would restore a nice symmetry to the integer types (they would be closed under negation). The reason non-NaN integer formats prevailed over others is due more to desirable properties such as convenient modulo/wrapping than to implementation costs (which could be moved into hardware rather than software, in spite of this answer’s prejudice otherwise). – Eric Postpischil Jul 13 '13 at 21:53
@Bohemian Very off-topic, but the 'massive year 2000 bug' wasn't *a* 'bug' at all, but a series of limitations that had been put into multiple pieces of software over the years, dictated mostly by disk space, and whose possible harmful effects were irresponsibly exaggerated into a world-wide panic/tax dodge, whereby IT companies and contractors like myself made large amounts of money remedying problems that didn't exist, and future-proofing systems to many decades beyond their actual planned life. The perfectly reasonable 'penny pinching' you refer to led to massive pound foolishness. – user207421 Jul 14 '13 at 04:03
@EJP: I don't think it was just disk space. Visual space on paper and to a lesser extent terminal screens was probably just as relevant. Additionally, using two-digit years rather than four could also save on keystrokes. – supercat Sep 04 '13 at 07:40
@supercat - The main issue wasn't with software written in the 1990's and 1980's. It was with code written in the 1960's / 1970's when memory, disk and (!) tape space were at a premium. Bear in mind that languages like COBOL actually encouraged users to (over-)think these issues, by making it easy to specify date fields as 2-digit numbers. (Remember `PIC` clauses?) – Stephen C Mar 20 '14 at 15:09
*"The use of two’s complement for integer representations does not preclude setting aside one of the values for use as a NaN."* - But if you do that, then every time you do arithmetic you need to check that you haven't either generated the NaN pattern when you shouldn't have (due to overflow) ... or not generated when you should have. That would add to the cost of arithmetic ... whether you do the checks in the hardware itself, or in the native code sequences emitted by the compiler. – Stephen C Mar 20 '14 at 15:15
There is a parallel to this. In the implementations of the CLU language, they used a mark bit to distinguish scalars from pointers in the GC. But since they were targeting conventional 32-bit hardware (VAX and 68k) they had to steal a bit from the integer types. That meant that the result of any integer arithmetic operation had to be masked-and-anded with the sign bit ... or something like that ... before you saved the result to a heap object. That was a significant overhead, and it is why we don't use tagging in modern GCs. – Stephen C Mar 20 '14 at 15:22
@StephenC: A two-digit year field is perfectly adequate for storing a year if one will always be comparing dates that are within 50 years of each other. If one performs comparisons on six-digit years by subtracting mod 1,000,000 and checks whether the difference exceeds 500,000 the results will be correct. Ironically, from a space-efficiency perspective, 999999 is horribly wasteful, using three bytes when two bytes could easily handle 128 years worth of dates (using fixed bitfields). – supercat Mar 20 '14 at 15:33
@StephenC: If INT_MIN is defined as -INT_MAX, and an arithmetic calculation which didn't overflow happened to generate -INT_MAX-1, then NaN is the proper result. What would make integer NaN expensive without hardware support would be ensuring that arithmetic involving a NaN will always yield a NaN result. There's a chicken-and-egg problem with adding widespread hardware support for such a feature, but having hardware and language support for such a feature would make it easier to write robust software (if a language has integer types which are not instantly-checked but overflow to NaN,... – supercat Mar 20 '14 at 15:42
...that would facilitate code which should "smoothly" take action when integers overflow. To be sure, it would probably be more helpful to get languages in the habit of trapping overflows which occur in places where they're not explicitly expected, but integer NaNs could be a useful aspect of robust coding. Digital signal processors often have such concepts (having audio signal values wrap sounds *really* bad), but mainstream processors generally don't. – supercat Mar 20 '14 at 15:49
@supercat - I don't buy that. 1) An overflow would generate an INF, not a NaN. 2) Either way, overflow has to generate INF or NaN in a mathematically consistent way to be an acceptable answer. Otherwise you end up with a worse problem than the (small) one you were trying to solve. 3) Using NaN deliberately to represent missing data is a hack. Use the primitive wrapper types instead ... and test for `null`. NaN does not mean "missing data". It means "a result of one kind of mathematically invalid operation" – Stephen C Mar 20 '14 at 16:06
@StephenC: The system where I saw an integer NaN used it as a common value for all invalid integer operations. If one did an integer calculation and didn't get a NaN, one could be assured that nothing in the calculation overflowed. Otherwise, the NaN would indicate something overflowed, but wouldn't indicate what (not terribly helpful, but in some cases being able to show a "could not compute value" indicator may be better than simply crashing). – supercat Mar 20 '14 at 19:33
@supercat - Please be more specific. Which programming language or hardware platform are you talking about that supports an >>integer<< NaN? Yea, sure, if you are running on a system that has hardware support for integers with NaN (or INF) then it makes sense to use them ... if you need them. But (AFAIK) no modern instruction sets support this, and doing it (properly) in hardware is expensive. – Stephen C Mar 21 '14 at 00:24
@StephenC: I'm afraid I don't remember for certain where it was I saw it; I think it was probably the 64-bit integer type of the Standard Apple Numerical Environment for Macintosh (which can do either hardware or software floating-point), but it's decades since I've looked at that. From a hardware perspective, variations of ADD and SUB which regard 10*0 as a propagating NaN would likely be cheaper and more efficient than would be variations which trap on overflow [overflow trapping and speculative execution don't mix very well]; an alternative would be a "sticky" overflow flag, though... – supercat Mar 21 '14 at 15:15
...for that to work well there would need to be separate "add and set overflow flag" and "add without affecting overflow flag" instructions, since otherwise multi-precision arithmetic would be difficult. I think my real point is that integer overflow trapping should be used a lot more than it is, and the next time someone comes up with a new architecture, including an integer NaN would likely allow such trapping to be handled more efficiently than would existing methods. – supercat Mar 21 '14 at 15:18

score 1 · Answer 3 · answered Jul 12 '13 at 13:43

As noted in other comments, it's largely because NaN is a standard value for floating point numbers. You can read about the reasons NaN would be returned on Wikipedia here:

http://en.wikipedia.org/wiki/NaN

Notice that only one of these reasons exists for integer numbers (divide by zero). There is also both a positive and a negative infinity value for floating point numbers that integers don't have and is closely linked to NaN in the floating point specification.

Why doesn't Integer represent NaN in Java?

3 Answers3

Linked