Good compression scheme for continued fraction terms?

Question

So I'm implementing a continued fraction library for handling a subset of quadratic integers and rational numbers. Continued fraction terms are represented by unsigned integers. I've noticed the following general patterns when working with continued fractions:

Most terms are small single digit numbers, with 1 being the most common.
Some terms can be enormous, the largest possible for my application is 366 bits, but these are extremely rare.
A large term represents an especially good number approximation, which means there are usually fewer terms overall for a large fraction.
The worst possible continued fraction is the golden ratio, and a rational approximation of that with 366 bits of precision corresponds to roughly 525 1's in a row.
Random rational numbers do not generally have large runs of the same number, but may have two to four in a row, with 1 again the most common.

So I have a limit on both the number of terms and the size of a term, with the number of terms roughly inversely proportional to their size. So storing these terms in machine words or even bytes is usually very wasteful of space, and I still need to handle multi-word arithmetic in the worst case. Given the roughly inverse relationship between term size and number of terms (which both also depend on the size of the fraction's numerator and denominator), I've been trying to find or come up with a good compression scheme so that I don't waste so much space storing the integer terms.

I've considered Huffman encoding, as the speed of encoding and decoding is nice, but I'm not sure how to come up with the probabilities for the code values. I have a vague intuition that Stern-Brocot Trees might offer a hint, as they are binary trees with a direct relationship to continued fractions.

Does anyone know of a good compression approach for handling lots of small numbers with occasional huge ones where runs of the same number are typically short (but might be long in rare cases)? In particular I need to be able to encode and decode fairly fast (say O(n*lg(n)) is the absolute worst speed I can tolerate, with O(n) or better preferable), and be able to get at the position of individual terms so that I know which term number I'm operating on (fourth term, fifth term, etc).

Also, I'm not interested in using any third party real number or continued fraction libraries. I've looked at several and they are either insufficient or overkill for my needs, and I'd like the experience of coding this up myself for my own edification.

Update

I've also learned of Gauss–Kuzmin distribution which gives the probability distribution of a particular continued fraction term of k for a random number uniformly distributed between 0 and 1. It is

P(k) = -lg(1.0 - 1.0/((k + 1.0)**2) in Python pseudo-code, where lg is the base 2 logarithm. This is the limit as k approaches infinity, so my case is somewhat more restricted (though 2**366 is still huge). "The Entropy of Continued Fractions" by Linas Vepstas gives the (information) entropy of Gauss-Kuzmin distribution as approximately 3.43 bits. My maximum denominator is large enough that the information entropy of my continued fractions is probably close to that limit, though the graph in the linked article shows the limit is approached quite slowly and so is different for relatively small denominators.

[LZW](https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch) is perfect for handling runs of small numbers. — Mark Ransom, Aug 21 '15 at 21:53
My concern with LZW is it seems to work best on repeated patterns, and continued fractions don't really exhibit any repeating patterns outside of continued fractions of square roots. — hatch22, Aug 22 '15 at 03:43

score 2 · Accepted Answer · answered Aug 21 '15 at 21:37

2

One possibility is a simple prefix code where the binary number 1x is has the bits x encoded as 0 -> 10 and 1 -> 11, followed by the terminator 0.

Table:

1 -> 0
2 -> 100
3 -> 110
4 -> 10100
5 -> 10110
6 -> 11100
7 -> 11110
8 -> 1010100

The corresponding Huffman code probabilities here for n are something like Theta(1/n^2) (waving my hands a bit because the alphabet is infinite).

answered Aug 21 '15 at 21:37

David Eisenstat

64,237
7
60
120

This seems like a lot of bits to use for small numbers. Can you give an example of how this saves space if say a long string of terms have no value greater than 100 (decimal)? It certainly looks better than using a byte per term, but I'm concerned at how fast the bit count is growing. – hatch22 Aug 22 '15 at 03:49
@hatch22 The motivation here is trying to guess the distribution of terms. I started by taking the approximation of drawing x <- (0, 1] uniformly at random, then returning floor(1/x). The probability of getting n is 1/n - 1/(n+1) = 1/(n(n+1)), so I tried to find a simple encoding where the implied Huffman probabilities would be roughly the same. – David Eisenstat Aug 22 '15 at 11:41
I follow the logic now. +1 for trying to work with the actual probability distribution a continued fraction term would have. – hatch22 Aug 22 '15 at 14:07
Is the [Gauss-Kuzmin distribution](https://en.wikipedia.org/wiki/Gauss%E2%80%93Kuzmin_distribution) a more accurate probability distribution then what you modeled, or are you modeling a different distribution for the probability of getting `n` and I'm not seeing why there is a distinction? – hatch22 Aug 24 '15 at 14:32
@hatch22 I didn't know about G--K. That's the thing to use if we can figure out how, but note that the pmf 1/(n(n+1)) approaches G--K in the limit, so it's not clear whether the change of underlying distributions would be material. – David Eisenstat Aug 24 '15 at 15:01
Duly noted. I'm still not crazy with the terminating 0 making everything except 1 cost an extra bit (2 is the median in the G--K distribution, though 1 is the mode), since 2 and 3 will still be fairly common, but if the number of 1's is sufficiently large this is still a win overall. I like the fact that it doesn't explode in size for really large values, which is a strike against [Rice Coding](https://en.wikipedia.org/wiki/Golomb_coding) that is only partially alleviated with recursive Rice Coding. That said, this encoding still gets pretty big pretty fast and I wonder if we can do better. – hatch22 Aug 24 '15 at 15:26
I found a paper by Matula and Kornerup entitled "[An order preserving Binary Encoding of the Rationals](http://acsel-lab.com/arithmetic/arith6/papers/ARITH6_Matula_Kornerup.pdf)" that advocates doing something very similar to what you suggest. Page 3 of the PDF shows a table similar to yours, and though the encoding is different, the rate of growth is the same or nearly the same, and they claim their encoding is within 2% of the optimal Huffman encoding. It seems you have been on the right track. I will accept your answer unless someone submits something significantly better. – hatch22 Aug 24 '15 at 16:56

score 2 · Answer 2 · answered Aug 22 '15 at 02:30

2

Your distribution seems amenable to Rice Coding. You can tune the coding parameter, let's call it k, to your data. The coding takes the bits of the number above the low k bits, and transmits that many 1 bits, followed by a 0. Then you transmit the low k bits directly.

answered Aug 22 '15 at 02:30

Mark Adler

101,978
13
118
158

Rice coding looks promising. I'll need to experiment with different values of k. – hatch22 Aug 22 '15 at 14:09
Does the additional information of having an information entropy of about 3.43 bits give a clue to a good value for `k`? – hatch22 Aug 24 '15 at 14:26
The problem I have with Rice Coding is it doesn't seem to handle large numbers very well. – hatch22 Aug 24 '15 at 18:02

score 1 · Answer 3 · answered Sep 29 '16 at 17:24

You can use arithmetic encoding considering each positive integer as a symbol in your source alphabet. It doesn't matter that there are infinitely many as the relative probability of larger and larger integers drops to zero.

In fact, considering a uniform distribution on [0,1), you can set the conditional probability of each new integer a_n in the continued fraction expansion (i.e. each new symbol from the entropy source) to be P(a_n = k ) = 1/k - 1/(k+1). You may want to consider the very first integer to understand why this conditional probability makes sense (half the numbers in the interval [0,1) will have a_1 = 1, for one sixth of them a_1 = 2, etc).

Also, you may need to flip the direction of the arithmetic encoding with each new symbol for decoding to be unambiguous. Unfortunately, arithmetic en/decoding is not very fast.

score 1 · Answer 4 · 2017-11-11T23:43:29.003

You might consider dropping continued fractions and instead implement their mutated cousin: continued logarithms. See the very bottom of that paper for a discussion and implementation of basic arithmetic.

There's even a hardware implementation for hugely parallel arithmetic on continued logarithms, with the same asymptotic complexity as FFT, but far simpler and thus much lower constants.

If you're looking for anything more exotic than can be built from just the reversible {+,-,*,/} operators, please see my new question on the floor operator.

As you've seen, continued fractions tend to blow up in term bit size for very large integers or very small fractions. This oscillation of gigantic terms with small terms is what you would need to exploit in any compression scheme.

Continued logarithms, on the other hand, split up large terms in a sort of 'recursive scientific notation,' as Bill Gosper puts in in that paper. Each term co-routine emits or consumes only very small messages which can be formed in a natural sort of run-length encoded serialization describing the 'log base 2 of the number remaining to be described.'

The unfortunate side effect of using this continued logarithms is that Hurwitz numbers are patternless, while in continued fractions are very regular. However most, but not all, quadratic surds remain periodic, which can be thought of as a compression scheme as well.

Once in the compact run-length encoded format you can use traditional compression techniques for runs of small numbers, such as LZW described in the question's comments by Mark Ransom.

score 0 · Answer 5 · answered Nov 16 '18 at 15:18

0

Another possibility is to use some sort of "Universal Encoding" scheme. As randomly chosen Continued Fractions rarely have really big partial quotients, Universal Encoding should be useful.

answered Nov 16 '18 at 15:18

ttw

111
3

Good compression scheme for continued fraction terms?

5 Answers5