What are the most efficient representations of numbers without native types?

Question

On Haskell, most numeric types are natives - Int, Float, Word32 etc. There is also a popular representation of unary natural numbers with ADTs only - that is the Peano encoding:

data Nat = Succ Nat | Zero

That datatype, while elegant, is not very efficient. Multiplication, exponentiation, division with unary numbers are unpractical. My question is: if we didn't have native types to count on, what would be the most efficient representation of numbers - nats, ints, fracs, complex, etc. - in a pure functional language such as Haskell? What would the datatypes and the respective algorithms look like?

Two notes: 1. I know one obvious solution would be to just represent numbers as list of bools and emulate the float/int algos - but would that be the most efficient? 2. I guess the question is somewhat broad, a complete answer would be way too long, so obviously I'm not asking for whole implementations. Just key insights and pointers would be very good! — MaiaVictor, Sep 04 '15 at 21:56
“Most efficient” in what regard? I don't suppose there's any way to give a universally objective criterion, so it'll more or less boil down to “what's most efficient in GHC”. But then the answer will be, _anything that compiles to an equivalent of the standard native arithmetic_. Pretty sure that is possible without explicitly using native types. So, while I like the idea of this question, I don't think there can be a proper answer. — leftaroundabout, Sep 04 '15 at 22:21
It depends highly on which operations you are performing. The data type `data N = OnePlusSum N N | Zero` can perform extremely well for some problems: it has `O(1)` operations for `n + 1`, `n + m + 1` and for checking to see that a number is non-zero. It won't do so well if you need to divide by two or subtract 1. — Cirdec, Sep 04 '15 at 22:33

score 4 · Answer 1 · answered Sep 04 '15 at 22:43

Leftaroundabout already commented on the ambiguity with respect to the term "most efficient" when referring to Haskell and ghc specifically. If what you're really asking is, "What might a more efficient encoding of numbers using ADTs (algebraic datatypes) in a language that supports them look like?", then I'd point you to the Basics chapter in Software Foundations, where you're given an exercise in defining a binary representation of the natural numbers.

A quote from the book:

Exercise: 3 stars (binary)

Consider a different, more efficient representation of natural numbers using a binary rather than unary system. That is, instead of saying that each natural number is either zero or the successor of a natural number, we can say that each binary number is either

zero,

twice a binary number,

or one more than twice a binary number.

(a) First, write an inductive definition of the type bin corresponding to this description of binary numbers.

(Hint: Recall that the definition of nat from class,
Inductive nat : Type :=
  | O : nat
  | S : nat → nat.

score 1 · Answer 2 · edited May 23 '17 at 11:52

It depends very much on what you want to do with the numbers and what do you mean by most efficient.

If you want to represent a natural number n, you need log n bits of information. And since an ADT can have only finitely many distinct constructors, it encodes a finite number of bits, so you need to have structure with at least log n nodes.

I recommend you very much Chapter Numerical Representations from Chris Okasaki's Functional Data Structures (the thesis is available online here). It describes various tree-like data structures, supporting different sets of operations, and how they relate to natural numbers. Everything below is what I learned from the book.

Expanding on Cirdec's comment: You can define

data N = Zero | Positive
data Positive = One | Add Positive Positive

This gives you O(1) addition and subtracting by one. On the other hand, the size of the structure will be O(n).

You can use a binary representation with O(log n) space, but then addition will be O(log n):

data N = Zero | Positive
data Positive = One | Twice Positive | TwicePlusOne Positive

increment and decrement will be almost amortized O(1). A sequence of increments will go to depth d only every 2^d operations, so in average, each increment will be O(1). Similarly for decerements. I said almost above, because if you interchange increments and decrements, then you can flip between a O(log n) increment and decrement operations. A solution to this is to add some redundancy:

data N = Zero | Positive
data Positive = One | Twice Positive | TwicePlusOne Positive | TwicePlusTwo Positive

Now every time an operation needs to go one level deeper, it leaves the current node at TwicePlusOne, which means the next operation affecting the node will stop at it, regardless if it's an increment or decrement.

If you want constant time addition, the data structure can be extended for that (look for Lists With Efficient Catenation in the book), but then again you can end up with O(n) memory, if the sequence of operations is bad. There is an open SO question How can natural numbers be represented to offer constant time addition? asking if it's possible to get both.

score 0 · Answer 3 · answered Sep 05 '15 at 10:02

As @Cirdec already commented, the question doesn't make sense with regard to programming languages. It makes only sense with regard to specific programming problems.

For example, if you need exact numbers, you may want to represent rationals as continued fractions. Which, OTOH, may be suboptimal for book-keeping.

What are the most efficient representations of numbers without native types?

3 Answers3