Python, fast compression of large amount of numbers with Elias Gamma

Question

We have a 2d list, we can convert it into anything if necessary. Each row contains some positive integers(deltas of the original increasing numbers). Total 2 billion numbers, with more than half equals to 1. When using Elias-Gamma coding, we can encode the 2d list row by row (we'll be accessing arbitrary rows with row index later) using around 3 bits per number based on calculation from the distribution. However, our program has been running for 12 hours and it still hasn't finished the encoding. Here's what we are doing:

from bitstring import BitArray
def _compress_2d_list(input: List[List[int]]) -> List[BitArray]:
    res = []
    for row in input:
        res.append(sum(_elias_gamma_compress_number(num) for num in row))
    return res 

def _elias_gamma_compress_number(x: int) -> BitArray:
    n = _log_floor(x)
    return BitArray(bin="0" * n) + BitArray(uint=x, length=_log_floor(x) + 1)

def log_floor(num: int) -> int:
    return floor(log(num, 2))

Called by:

input_2d_list: List[List[int]]  # containing 1.5M lists, total 2B numbers
compressed_list = _compress_2d_list(input_2d_list)

How can I optimize my code to make it run faster? I mean, MUCH FASTER...... I am ok with using any reliable popular library or data structure.

Also, how do we decompress faster with BitStream? Currently I read prefix 0's one by one, then read the binary of the compressed number in a while loop. It's not very fast either...

I would say C++ rather than C (with pybind11 or Boost.Python it shouldn't take much time). Or try something like Cython, but that's probably gonna take longer to coax to do what you need. Your main enemy here is the interpreter, with each statement executed involving a massive overhead... over the 2 billion iterations that overhead adds up, as you can clearly see. — Dan Mašek, Jul 11 '20 at 08:45

Mark Adler · Answer 1 · 2020-07-12T06:00:16.713

2

Some simple optimizations result in a factor of three improvement:

def _compress_2d_list(input):
    res = []
    for row in input:
        res.append(BitArray('').join(BitArray(uint=x, length=2*x.bit_length()-1) for x in row))
    return res

However, I think you'll need something better than that. On my machine, this would finish in about 12 hours on 1.5 million lists with 1400 deltas each.

In C it takes about a minute to encode. About 15 seconds to decode.

edited Jul 12 '20 at 06:00

answered Jul 11 '20 at 18:01

Mark Adler

101,978
13
118
158

1

I don't know about `sum` for this particular object type. But `list.append` is amortized `O(1)`, see for example [here](https://stackoverflow.com/q/33044883/7207392). – Paul Panzer Jul 11 '20 at 21:57
1

@PaulPanzer You're correct. I did some testing and found both `append` and `sum` on `BitArray`'s guard against the bad _n^2_ behavior. – Mark Adler Jul 12 '20 at 00:17
1

You may be able to get another easy speedup by using `join` instead of `sum` as recommended in the [`bitstring` docs](https://bitstring.readthedocs.io/en/latest/slicing.html#joining) – Paul Panzer Jul 12 '20 at 00:43
1

Thanks! Now a factor of three improvement. – Mark Adler Jul 12 '20 at 00:55

score 2 · Accepted Answer · answered Jul 11 '20 at 18:08

If you are ok with numpy "bitfields" you can get the compression done in a matter of minutes. Decoding is slower by a factor of three but still a matter of minutes.

Sample run:

# create example (1'000'000 numbers) 
a = make_example()
a
# array([2, 1, 1, ..., 3, 4, 3])

b,n = encode(a) # takes ~100 ms on my machine
c = decode(b,n) #       ~300 ms

# check round trip
(a==c).all()
# True

Code:

import numpy as np
    
def make_example():
    a = np.random.choice(2000000,replace=False,size=1000001)
    a.sort()
    return np.diff(a)

def encode(a):
    a = a.view(f'u{a.itemsize}')
    l = np.log2(a).astype('u1')
    L = ((l<<1)+1).cumsum()
    out = np.zeros(L[-1],'u1')
    for i in range(l.max()+1):
        out[L-i-1] += (a>>i)&1
    return np.packbits(out),out.size

def decode(b,n):
    b = np.unpackbits(b,count=n).view(bool)
    s = b.nonzero()[0]
    s = (s<<1).repeat(np.diff(s,prepend=-1))
    s -= np.arange(-1,len(s)-1)
    s = s.tolist() # list has faster __getitem__
    ns = len(s)
    def gen():
        idx = 0
        yield idx
        while idx < ns:
            idx = s[idx]
            yield idx
    offs = np.fromiter(gen(),int)
    sz = np.diff(offs)>>1
    mx = sz.max()+1
    out = np.zeros(offs.size-1,int)
    for i in range(mx):
        out[b[offs[1:]-i-1] & (sz>=i)] += 1<<i
    return out

Sir you are freaking incredible! – Jack Feng Jul 12 '20 at 06:34 — Jack Feng, Jul 12 '20 at 06:34

Python, fast compression of large amount of numbers with Elias Gamma

2 Answers2