52

This is not a question about how to compare two BigDecimal objects - I know that you can use compareTo instead of equals to do that, since equals is documented as:

Unlike compareTo, this method considers two BigDecimal objects equal only if they are equal in value and scale (thus 2.0 is not equal to 2.00 when compared by this method).

The question is: why has the equals been specified in this seemingly counter-intuitive manner? That is, why is it important to be able to distinguish between 2.0 and 2.00?

It seems likely that there must be a reason for this, since the Comparable documentation, which specifies the compareTo method, states:

It is strongly recommended (though not required) that natural orderings be consistent with equals

I imagine there must be a good reason for ignoring this recommendation.

bacar
  • 9,761
  • 11
  • 55
  • 75

7 Answers7

39

Because in some situations, an indication of precision (i.e. the margin of error) may be important.

For example, if you're storing measurements made by two physical sensors, perhaps one is 10x more precise than the other. It may be important to represent this fact.

Oliver Charlesworth
  • 267,707
  • 33
  • 569
  • 680
  • 1
    I guess I haven't thought of use of `BigDecimal` to capture the amount of precision (just as a type which allows arbitrary *amounts* of precision). Viewed in that way, it makes perfect sense, however then I have to let go of thinking of the object as a numerical type - it does not behave as one as far as `equals` is concerned. – bacar Dec 31 '12 at 13:31
  • 37
    In my experience the situations in which you want `equals()` to capture that semantical difference in precision are far rarer than the intuitive case. On top of that, the intuitive case would mean `BigDecimal`'s `compareTo()` would be consistent with `equals()`. In my opinion, sun made a mistake here. – bowmore Dec 31 '12 at 13:44
  • 4
    @bowmore, that would be my guess too, but experiences vary. Purists could argue they should have provided 2 classes - one class not suitable for sorting (no `compareTo`) that captures precision as a visible part of the object; and a second class implementing `Comparable` with `compareTo` consistent with `equals` that treats scale & value as a whole. However providing both could seem rather bloated / unpragmatic and create rather than defuse confusion - Sun allowed both functionalities by providing inconsistent `compareTo` and `equals` (and surprise many of us along the way). – bacar Dec 31 '12 at 13:57
  • 13
    @bacar an implementation featuring a method like say `boolean equalsWithPrecision(BigDecimal other)` would have allowed both functionalities, *and* be consistent. – bowmore Dec 31 '12 at 14:24
  • 4
    It also seems to [break Set and Map usages](http://stackoverflow.com/questions/20091723/how-do-i-check-if-a-bigdecimal-is-in-a-set-or-map-in-a-scale-independent-way). – Geoffrey De Smet Nov 20 '13 at 09:17
  • 3
    @GeoffreyDeSmet: Whether such usages are "broken" depends the intended purpose of the set. If one is creating a set for the purpose of allowing references to equivalent-but-distinct instances to be replaced with references to a single instance, the behavior of `equals` is perfect; I would consider definitions of `equals` which were inconsistent with usage somewhat dangerous. – supercat Jul 27 '14 at 17:49
  • I agree with this idea, but IMHO a class called "Measure" with two numbers: a measured value and an errorbar would have been better, because in most of the cases your instrumental error is not necessarely 1 on some digit. – user1708042 Nov 06 '19 at 13:54
32

The general rule for equals is that two equal values should be substitutable for one another. That is, if performing a computation using one value gives some result, substituting an equals value into the same computation should give a result that equals the first result. This applies to objects that are values, such as String, Integer, BigDecimal, etc.

Now consider BigDecimal values 2.0 and 2.00. We know they are numerically equal, and that compareTo on them returns 0. But equals returns false. Why?

Here's an example where they are not substitutable:

var a = new BigDecimal("2.0");
var b = new BigDecimal("2.00");
var three = new BigDecimal(3);

a.divide(three, RoundingMode.HALF_UP)
==> 0.7

b.divide(three, RoundingMode.HALF_UP)
==> 0.67

The results are clearly unequal, so the value of a is not substitutable for b. Therefore, a.equals(b) should be false.

Stuart Marks
  • 127,867
  • 37
  • 205
  • 259
  • 1
    you make it sound sooo easy with this example. awesome! – Eugene Mar 09 '21 at 20:34
  • 11
    @Eugene The example was soooo good that we decided to put it into the javadoc: https://github.com/openjdk/jdk/commit/a1181852 (it should appear in JDK 17 build 13). – Stuart Marks Mar 11 '21 at 01:38
  • 1
    …and this leads to the conclusion that we should be careful when mixing order and equality, as otherwise, we get bugs like the behavior of `Stream.of("0.1", "0.10", "0.1") .map(BigDecimal::new) .sorted().distinct() .forEach(System.out::println);` – Holger Mar 22 '21 at 16:01
  • 2
    @Holger Correct. [JDK-8223933](https://bugs.openjdk.java.net/browse/JDK-8223933). – Stuart Marks Mar 22 '21 at 21:00
10

A point which has not yet been considered in any of the other answers is that equals is required to be consistent with hashCode, and the cost of a hashCode implementation which was required to yield the same value for 123.0 as for 123.00 (but still do a reasonable job of distinguishing different values) would be much greater than that of a hashCode implementation which was not required to do so. Under the present semantics, hashCode requires a multiply-by-31 and add for each 32 bits of stored value. If hashCode were required to be consistent among values with different precision, it would either have to compute the normalized form of any value (expensive) or else, at minimum, do something like compute the base-999999999 digital root of the value and multiply that, mod 999999999, based upon the precision. The inner loop of such a method would be:

temp = (temp + (mag[i] & LONG_MASK) * scale_factor[i]) % 999999999;

replacing a multiply-by-31 with a 64-bit modulus operation--much more expensive. If one wants a hash table which regards numerically-equivalent BigDecimal values as equivalent, and most keys which are sought in the table will be found, the efficient way to achieve the desired result would be to use a hash table which stores value wrappers, rather than storing values directly. To find a value in the table, start by looking for the value itself. If none is found, normalize the value and look for that. If nothing is found, create an empty wrapper and store an entry under the original and normalized forms of the number.

Looking for something which isn't in the table and hasn't been searched for previously would require an expensive normalization step, but looking for something that has been searched for would be much faster. By contrast, if HashCode needed to return equivalent values for numbers which, because of differing precision, were stored totally differently, that would make all hash table operations much slower.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • Interesting observation. Correctness trumps performance, so you have to have a short list of what you consider to be the "correct" behaviour of a BigDecimal class (ie should scale/precision be considered for equality) before you start considering performance. We've no idea if this particular argument swung it. Your arguments are equally applicable to `equals` too, of course. – bacar Jul 27 '14 at 14:16
  • @bacar: There are two equivalence-related questions which can sensibly be asked of any object (IHMO, the virtual methods of `Object` should have provided for both): "May X and Y be safely regarded as equivalent, even if references are freely shared with outside code", and "May X and Y be safely regarded as equivalent by their owner, if it maintains exclusive control over X, Y, and all constituent mutable state?" I would suggest that the only types which should define `equals` in a fashion which doesn't match either of the above would be those whose instances are not expect to be... – supercat Jul 27 '14 at 16:12
  • ...exposed to the outside world. For example, if one needs to use a hashed set of strings which are compared in case-insensitive fashion, one could define a `CaseInsensitiveStringWrapper` type whose `equals` and `hashCode` operate on uppercase versions of the wrapped string. Although the wrapper would have an "unusual" meaning for `equals`, *it would not be exposed to outside code*. Since `BigDecimal` is intended for use by outside code, it should only report instances as equal if all reasonable outside code would consider them equivalent. – supercat Jul 27 '14 at 16:18
  • @bacar: Personally, I think the situation with the `equals` and `compareTo` methods of `BigDecimal` is great: code which wants things to be compared based upon value can use `compareTo`, and code which wants to compare based upon equivalence can use `equals`. Note that precision doesn't just affect output; I believe at least one way of performing division uses precision of the dividend to control the precision to which the result is rounded, such that 10.0/3 would 3.3, while 10.000/3 would yield 3.333. Substituting 10.0 for 10.000 would thus not be safe. – supercat Jul 27 '14 at 16:26
  • 1
    Division may have been specified to behave differently, if equality had been specified differently. I think your `CaseInsensitiveStringWrapper`raises a very interesting point though - it is easy to implement a 'fuzzier' equivalence on top of a stricter one, whereas it may be harder, impossible or simply surprising to implement a strict one in terms of a fuzzier one. Either way, the principle of least surprise if violated for one set of users or another. – bacar Jul 27 '14 at 20:29
  • @bacar: I would suggest that if users are taught that they should *expect* to use methods other than `equals` when they want to test loose equality, then nobody need be surprised. – supercat Jul 27 '14 at 20:34
6

In math, 10.0 equals 10.00. In physics 10.0m and 10.00m are arguably different (different precision), when talking about objects in an OOP, I would definitely say that they are not equal.

It's also easy to think of unexpected functionality if equals ignored the scale (For instance: if a.equals(b), wouldn't you expect a.add(0.1).equals(b.add(0.1)?).

Aleksander Blomskøld
  • 18,374
  • 9
  • 76
  • 82
  • 2
    Yes, I would expect that, but I don't understand your point; I'm not suggesting it ignore the scale; I'm suggesting it consider the value and the scale as a *whole*, as `compareTo` does. – bacar Dec 31 '12 at 13:22
  • 6
    OK. I understand that sometimes users may want to consider precision, but I still don't get what your point is about unexpected functionality. If they'd chosen to let 2.0 equals 2.00, I'm not sure where your example of adding 0.1 causes problems. – bacar Dec 31 '12 at 13:44
5

If numbers get rounded, it shows the precision of the calculation - in other words:

  • 10.0 could mean that the exact number was between 9.95 and 10.05
  • 10.00 could mean that the exact number was between 9.995 and 10.005

In other words, it is linked to arithmetic precision.

assylias
  • 321,522
  • 82
  • 660
  • 783
2

The compareTo method knows that trailing zeros do not affect the numeric value represented by a BigDecimal, which is the only aspect compareTo cares about. By contrast, the equals method generally has no way of knowing what aspects of an object someone cares about, and should thus only return true if two objects are equivalent in every way that a programmer might be interested in. If x.equals(y) is true, it would be rather surprising for x.toString().equals(y.toString()) to yield false.

Another issue which is perhaps even more significant is that BigDecimal essentially combines a BigInteger and a scaling factor, such that if two numbers represent the same value but have different numbers of trailing zeroes, one will hold a bigInteger whose value is some power of ten times the other. If equality requires that the mantissa and scale both match, then the hashCode() for BigDecimal can use the hash code of BigInteger. If it's possible for two values to be considered "equal" even though they contain different BigInteger values, however, that will complicate things significantly. A BigDecimal type which used its own backing storage, rather than a BigInteger, could be implemented in a variety of ways to allow numbers to be quickly hashed in such a way that values representing the same number would compare equal (as a simple example, a version which packed nine decimal digits in each long value and always required that the decimal point sit between groups of nine, could compute the hash code in a way that would ignore trailing groups whose value was zero) but a BigDecimal that encapsulates a BigInteger can't do that.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • 2
    "the `equals` method generally has no way of knowing what aspects of an object someone cares about" - I **vehemently** disagree with this statement. Classes define (sometimes implicitly) a contract for their externally visible behaviour, which includes `equals`. Classes often exist **specifically to hide (by encapsulation) details that users do not care about**. – bacar Jan 22 '13 at 09:59
  • 2
    Also - I don't think that in general you should have an expectation that `equals` be consistent with `toString`. Classes are at liberty to define `toString` pretty much however they see fit. Consider an example from the JDK, `Set s1 = new LinkedHashSet(); s1.add("foo"); s1.add("bar"); Set s2 = new LinkedHashSet(); s2.add("bar"); s2.add("foo");` `s1` and `s2` have different string representations but compare equal. – bacar Jan 22 '13 at 10:44
  • @bacar: Perhaps I'm over-extending .Net principles to Java. The hashed collections in .Net allow one to specify methods for equality comparison and hashing, thus effectively telling the collection what aspects of the object it should be interested in. If one had a collection type that maintained its elements in sequence, but offered `SequenceEquals` `GetSequenceHashCode`, `ContentEquals`, and `GetContentHashCode` methods, one could then store such a type into a hashed collection using reference equality, sequence equality, or order-independent content equality. – supercat Jan 22 '13 at 16:22
  • I disagree with this statement, too. I've found, in my own experience when overriding the `equals()` method in custom objects, it's better to define equivalence on a small scale (aka, as few object attributes as possible) rather than on a big scale. The fewer attributes that contribute to equivalence, the better. Databases work in this same principle. – ryvantage Jan 04 '14 at 21:39
  • @ryvantage: One wouldn't generally expect to use objects with many fields as dictionary keys for purposes of looking up "other" information, but especially when dealing with hierarchical collections there may be a number of circumstances where one ends up with many copies of the same information; if one can efficiently identify references to distinct but equivalent objects, replacing references to all but the oldest copy with references to the oldest copy may save memory and improve performance; to do that, one must compare all fields. – supercat Jan 04 '14 at 23:33
  • Well, for me, I use Objects in my applications that are modeled exactly like they are on the database, using `HashSet` to store a lot of them, and using methods like `add()` and `contains()`, it looks for equivalence, so, at first, when I overrode `equals()` it compared every field of the object, but if for some reason a new element got added that was a little different, the `HashSet` would retain them both, which was no bueno. I ended up defining equality (and hashvalue) based exclusively on the `id` (primary key) from the database. – ryvantage Jan 05 '14 at 00:08
  • So, in my sense, if two objects have the same `id`, **they represent the same instance of the object**, even if their fields aren't equal. This was the only way to get them to behave in any kind `Set` I used. – ryvantage Jan 05 '14 at 00:10
  • Are the objects mutable or immutable, and what is their relation to any persistent store? If the objects are tied to rows in a database, I would suggest that multiple distinct objects attached to the same row shouldn't *exist* in the first place. Otherwise, I'm not quite clear why you're using a `Set` rather than `Map`? I would think that natural way to store things would be as a `Map, whose "key" object encapsulates those parts of the data which are relevant to equality. – supercat Jan 05 '14 at 00:37
2

I imagine there must be a good reason for ignoring this recommendation.

Maybe not. I propose the simple explanation that the designers of BigDecimal just made a bad design choice.

  1. A good design optimises for the common use case. The majority of the time (>95%), people want to compare two quantities based on mathematical equality. For the minority of the time where you really do care about the two numbers being equal in both scale and value, there could have been an additional method for that purpose.
  2. It goes against people's expectations, and creates a trap that's very easy to fall into. A good API obeys the "principle of least surprise".
  3. It breaks the usual Java convention that Comparable is consistent with equality.

Interestingly, Scala's BigDecimal class (which is implemented using Java's BigDecimal under the hood) has made the opposite choice:

BigDecimal("2.0") == BigDecimal("2.00")     // true
Matt R
  • 9,892
  • 10
  • 50
  • 83
  • 1
    A fundamental requirement of `equals` is that two objects with unequal hash codes must compare unequal, and the design of `BigDecimal` is such that numbers with different precision are stored very differently. Thus, having `equals` regard values with different precision as equivalent would greatly impair the performance of hash tables, even those in which all values were stored with equivalent precision. – supercat Jul 25 '14 at 23:06
  • 1
    @supercat Good observation. However, I'd argue that `BigDecimal`-keyed `Map`s (and `Set`s) are so rare a use-case that it's not sufficient justification for a scale-sensitive `equals`. – Matt R Jul 26 '14 at 11:01
  • Use of such types as map keys may not be terribly common, but it's probably not terribly rare either. Among other things, code which ends up computing similar values frequently may sometimes benefit enormously from caching frequently-computed values. For that to work efficiently, it's imperative that the hash function be good and fast. – supercat Jul 27 '14 at 04:41
  • 2
    @supercat 1) It's safe to say `BigDecimal` keys are much rarer than people getting bitten by its unintuitive definition of equality; 2) if a scale-insensitive hash is a performance bottleneck, you're likely in a setting where using `BigDecimal` itself is too slow (e.g. you might switch to `long`s for monetary calculations). – Matt R Jul 27 '14 at 09:21