We have a 2d list, we can convert it into anything if necessary. Each row contains some positive integers(deltas of the original increasing numbers). Total 2 billion numbers, with more than half equals to 1. When using Elias-Gamma coding, we can encode the 2d list row by row (we'll be accessing arbitrary rows with row index later) using around 3 bits per number based on calculation from the distribution. However, our program has been running for 12 hours and it still hasn't finished the encoding. Here's what we are doing:
from bitstring import BitArray
def _compress_2d_list(input: List[List[int]]) -> List[BitArray]:
res = []
for row in input:
res.append(sum(_elias_gamma_compress_number(num) for num in row))
return res
def _elias_gamma_compress_number(x: int) -> BitArray:
n = _log_floor(x)
return BitArray(bin="0" * n) + BitArray(uint=x, length=_log_floor(x) + 1)
def log_floor(num: int) -> int:
return floor(log(num, 2))
Called by:
input_2d_list: List[List[int]] # containing 1.5M lists, total 2B numbers
compressed_list = _compress_2d_list(input_2d_list)
How can I optimize my code to make it run faster? I mean, MUCH FASTER...... I am ok with using any reliable popular library or data structure.
Also, how do we decompress faster with BitStream
? Currently I read prefix 0's one by one, then read the binary of the compressed number in a while loop. It's not very fast either...