1

What are the potential downsides of consistently using floating point types to represent integers, even when indexing into arrays? Assume the context of a performance-oriented C library. The choice is between 64-bit integers and 64-bit floating point.

I feel uncomfortable about doing such a thing, as doubles are not meant for indexing, and using a tool for something it was not designed for usually carries risk. But I would like to understand if there are rational reasons to avoid doing this.

To get the obvious things out of the way:

  • Of course some casts might be required to use a double with the [ ] operator.
  • Of course an IEEE 754 double cannot represent as many distinct integers as a 64-bit integer type can, but 53 bits are likely to be more than enough for indexing arrays in the foreseeable future.

Such uses of floating point types are in fact found in the wild. R, for example, does not have 64-bit integers, and supports large arrays by using doubles for indexing. When writing code that must interoperate with R, one must consider whether to do the same.

Szabolcs
  • 24,728
  • 9
  • 85
  • 174
  • 9
    If Javascript isn't sufficient to scare you away from this idiocy, then presumably nothing will. – EOF Oct 03 '21 at 08:50
  • @EOF I don't know Javascript. I find myself in a situation where this is being considered (because of R), and while it "feels like a bad idea", I am looking to collect some rational arguments instead of just going with what feels right or wrong. – Szabolcs Oct 03 '21 at 08:53
  • 2
    It will be certainly much slower and likely take more space (unless you compare `double` to 64-bit integers). This is especially true on embedded processors where some do not have an FP unit so such operations are emulated which is insanely slow (eg >100 times an integer operation). – Jérôme Richard Oct 03 '21 at 09:04
  • 2
    Float point calculations take more cycles than integer calculations. So, for a performance sensitive application/library, it's mostly likely not a good choice. – Fractal Oct 03 '21 at 09:10
  • @EricPostpischil You're right, fixed. – Szabolcs Oct 03 '21 at 15:43
  • As others have said, performance performance performance. Pretty much everything else can be worked around, at the cost of more performance. One could give a general sense by comparing cycle counts for some typical integer and floating-point instructions (and don't forget that converting float to int is fairly expensive too), but profiling would show you just how it affects your workloads. – Nate Eldredge Oct 04 '21 at 01:12
  • For what it's worth, there exists a computer today with more than 2^52 bytes of memory: https://www.itpro.com/hardware/360706/7-most-powerful-computers-of-all-time – Nate Eldredge Oct 04 '21 at 01:31
  • 1
    Interfacing with software contaminated with (redacted) is something many of us must endure from time to time. There is a choice however. You can strictly confine the (redacted) to interface boundaries, or let it spread and take over your whole world. Choose wisely. – n. m. could be an AI Oct 04 '21 at 18:08

1 Answers1

6
  1. Performance: On many CPU architectures floating point operations are slower than integer ones. Floating point vs integer calculations on modern hardware It depends heavily on the type of operation and the CPU in question though. This might not matter, if the code is not heavily exercised (profiled Hot) or it is "good enough" for the application anyway.
  2. Representation: Floating points are typically represented in base 2 and not all (exactly representable base 10) numbers can be represented exactly. When is it appropriate to use floating precision data types? This has implications for arithmetic and can yield unexpected results. Fun times
  3. Comparisons: As a consequence of the representation difficulties, some linters and libraries do not allow equality checks between floating points. SonarSource Java rule xUnit Assert library (note absence of Equals(double double)) This is to reduce likelihood of bugs but may impact your use of doubles as integers.
  4. Principle of least astonishment: Using floats where integers are normally expected leads to higher effort to understand the code, which in turn makes maintenance, changes more difficult.
  5. Poor support in integer-native languages: In languages that use integers as their primary type (C for example) using floating points in place of integers leads to increased "friction" with the rest of the language and likely many libraries. Essentially you are trading interop with R for "interop" with C in your example.

For doing interop with (almost any language, or even library) I would recommend creating an interoperability layer or component that takes care of all the issues it can and documenting those that it cannot, in essence abstracting away (some of) the complexities of the interop.

Peter Lindsten
  • 539
  • 4
  • 8
  • 2
    Point 3 is incorrect. Comparing two floating-point numbers for equality evaluates as true if and only if the two operands represent the same number. There is never any error in this operation, and it is not “implementation specific.” The cause of this myth is that naïve programmers attempt to compare operands containing errors from previous operations and may not know how to deal with those errors. But those errors arise from prior arithmetic operations, as in point 4, and not from the comparison. – Eric Postpischil Oct 03 '21 at 10:48
  • Points 2 and 4 are incompletely stated aspects of the same thing: A floating-point operation produces a result equivalent to the real-number result rounded to the nearest representable value. – Eric Postpischil Oct 03 '21 at 10:56
  • Incidentally, 10/3•3 produces 10 in any binary-based floating-point format with rounding-to-nearest, ties-to-even. The infinitely precise result of 10/3 would be, in binary, 11.01010101… If the significand has an even number of bits, the repetition ends with a 1, so the result is rounded down. Then multiplying by 3 yields 1001.1111…11, where two bits lie beyond the significand width so the 11 is rounded up, producing 10. If there are an odd number of bits, the quotient is rounded up, making 11.010101…01011. Multiplying by 3 yields 1010.000…01, again with two bits beyond, so the 01 rounds down. – Eric Postpischil Oct 03 '21 at 11:14
  • Edited point 3. – Peter Lindsten Oct 03 '21 at 12:11
  • Sorry, but comparing for less-than-or-equal is the same as comparing for equal; it is always correct. `a <= b` evaluates to true if and only if `a` represents a number that is less than or equal to `b`. There may be errors in other operations, but not in comparison. Nonetheless, using `<=` is not a remedy to problems using `==`. If there are errors in earlier operations, then the operands to `<=` or `==` are wrong, and using one or the other will not fix that. – Eric Postpischil Oct 03 '21 at 14:18
  • If, in integer arithmetic, `10/3*3` produced 9 instead of 10 (oh, wait, it does), would you then say you cannot compare integers or that you should only compare them with `<=` instead of `==`? Quite simply, the errors in arithmetic operations have nothing to do with how comparison works. Truncation in integer arithmetic does not mean you cannot compare integers for equality, and rounding-to-nearest in floating-point arithmetic does not mean you cannot compare floating-point numbers for equality. – Eric Postpischil Oct 03 '21 at 14:19