4

Have noticed some rather peculiar behaviour in numpy regarding differentiating -0.0 from 0.0. Here are some examples:

#normal python doesn't distinguish between 0 and -0:
>>> -0
0
>>> -0==0
True

#numpy also sometimes changes -0 to 0:
>>> import numpy as np
>>> np.array([-0])
array([0])

#HERE IS THE SURPRISE - numpy does seem to have separate 0 and -0:
>>> np.round(0.1)
0.0
>>> np.round(-0.1)
-0.0

#yet numpy is of course aware that -0 and 0 are equal:
>>> np.round(-0.1) == np.round(0.1)
True

#python round() function doesn't behave like this:
>>> round(0.1)
0
>>> round(-0.1)
0

Background - why do I care about this? Because I have a list of numpy arrays and I want to remove arrays which are equal to another array in the list to 2d.p.. To do this, I changed the list of arrays to a dict of arrays, where the key of each item is the array rounded to 2d.p. Now an array can't be used as a dict key, so I used .tobytes() after rounding it, and the byte representation of the rounded array is the item's key. The item's value is the unrounded array, as I want to keep the precision.
Imagine my surprise when I noticed this didn't get rid of identical arrays, simply because one had a -0 and the other a 0...

>>> np.round(0.1).tobytes()           #ends with x00
b'\x00\x00\x00\x00\x00\x00\x00\x00'
>>> np.round(-0.1).tobytes()          #ends with x80
b'\x00\x00\x00\x00\x00\x00\x00\x80'

>>> np.round(0.1).tobytes() == np.round(-0.1).tobytes()  #well obviously this will be False
False

Why does numpy store -0 and 0 differently, and why sometimes and not always? Are there other examples of this behaviour - the above is what I've come up with so far. Why is numpy different to python in this? Are there any examples when python also has separate 0 and -0? How can I get my code to recognise that -0 and 0 in one position in the array are identical? If you have -0.0, then adding 0 changes it to 0, but subtracting 0 leaves it at -0.0. Why is this?

Braiam
  • 1
  • 11
  • 47
  • 78
gnoodle
  • 149
  • 9
  • 1
    Does this answer your question? [How to eliminate the extra minus sign when rounding negative numbers towards zero in numpy?](https://stackoverflow.com/questions/26782038/how-to-eliminate-the-extra-minus-sign-when-rounding-negative-numbers-towards-zer) – oskros May 06 '21 at 14:25
  • 1
    For the first question, this is due to the IEEE-754 standard. Numbers need to be normalized to avoid that. For the second point, I guess the oskros' link provide the answer. – Jérôme Richard May 06 '21 at 14:30
  • Cool - the link from oskros suggests normalizing by adding 0.0 to the array. – jkr May 06 '21 at 14:38
  • @oskros thanks - I hadn't managed to find that post, but had already discovered myself that adding zero helped. The main purpose of my question was regarding the theoretical side of _why_ it all works like this, rather than how to get around this. In fact even the answer I wrote (add zero) made it clear that while this works, I am missing the underlying understanding and logic behind all this – gnoodle May 06 '21 at 14:51
  • @JérômeRichard thanks for the info re IEEE-754. What is the explanation for this behaviour though? Why was the standard set up to have separate 0 and -0 which when testing for equality returns True, when mathematically (I assume) there is no separate 0 and -0? – gnoodle May 06 '21 at 14:56
  • the reason python's `round()` behaves differently to `np.round()` is _not_ because python has only 0 and not -0 (as I assumed in the question). Python also has both 0 and -0. The difference is because python's `round()` automatically converts to an integer when rounding to 0 d.p.s, whereas `np.round()` leaves it as a float. The proof of this is that you can get a -0 from python `round()` as well, by doing `>>> round(-0.001,1)` which returns `-0.0`. The question of _why_ python and numpy are set up to have a separate -0 remains. – gnoodle May 06 '21 at 15:15

2 Answers2

1

I think you missed a simple but crucial point:

>>> print(0, -0, 0.0, -0.0)
0 0 0.0 -0.0

Your initial statements culminating in np.array([-0]) all create integers because numbers without a decimal point or e are integer literals in python. Python does not exactly use normal twos-complement for its infinite-precision integers but there's still only one way to represent zero, and no way to represent NaN.

At the same time, the result of /, or a literal with a decimal point or e in it gets interpreted as a 64-bit IEEE-754 float. In that representation, there are many ways to represent both zero and NaN.

So if you want negative zero, don't use integer literals: use floats directly. The other operations you do, like rounding, are just complicated ways to convert your numbers into floats, which have a way to represent negative zero.

By extension, the results of the round functions is totally expected as well. Numpy returns a float, while python returns an int. You can see this in the repr of the results printed on the command line: float zero looks like 0.0 or 0., while int zero is just 0.

It all boils down to the fact that common integer representations can't differentiate between zeros while floats can.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • yes I had missed that - I realised this point and added it as a comment to my question while you were adding this answer. Thanks for pointing this out! Why is it though that integers only have one zero but floats have two zeros? Seems inconsistent - as well as strange that there should ever be two zeros - what on earth is the purpose of that... – gnoodle May 06 '21 at 15:20
  • @gnoodle. You have to look into the binary representations to see why. It's totally sensible. There are multiple zeros and multiple NaNs in IEEE-754. They exist, so people use them. – Mad Physicist May 06 '21 at 15:22
-1

Re question 2 - a workaround:

add 0 to the array. For some reason this returns 0.0 instead of -0.0:

>>> np.round(-0.1)+0
0.0

Using .tobytes() obviously gives the same as for a 'normal' zero:

>>> (np.round(-0.1)+0).tobytes()
b'\x00\x00\x00\x00\x00\x00\x00\x00'
>>> np.round(0.1).tobytes() == (np.round(-0.1)+0).tobytes()
True

So, after rounding to 2d.p. but before using .tobytes(), add zero to the array.

Re question 1 "Why does numpy store -0 and 0 differently"

See https://en.wikipedia.org/wiki/Signed_zero:

"It is claimed that the inclusion of signed zero in IEEE 754 makes it much easier to achieve numerical accuracy in some critical problems,[1] in particular when computing with complex elementary functions.[2] On the other hand, the concept of signed zero runs contrary to the general assumption made in most mathematical fields that negative zero is the same thing as zero. Representations that allow negative zero can be a source of errors in programs, if software developers do not take into account that while the two zero representations behave as equal under numeric comparisons, they yield different results in some operations."
References
[1] William Kahan, "Branch Cuts for Complex Elementary Functions, or Much Ado About Nothing's Sign Bit", in The State of the Art in Numerical Analysis (eds. Iserles and Powell), Clarendon Press, Oxford, 1987.
[2] William Kahan, Derivatives in the Complex z-plane, p. 10.

gnoodle
  • 149
  • 9
  • `2*(-0)` should be `-0` of course. Would it surprise you that `-1-1==-2` rather than `0`? Hopefully that clarifies `-0+0` as well. – Mad Physicist May 06 '21 at 15:17
  • 1
    While it's nice that you took the time to play around, this is part of the question, or a whole new question. Keep in mind that SO is a Q&A site, not a forum with threads. Answers should be self-contained. – Mad Physicist May 06 '21 at 15:20