Here is an implementation of well-known Rice coding (= Golomb code with M = 2^k
http://en.wikipedia.org/wiki/Golomb_coding), widely used in compression algorithms, in Python.
Unfortunately it is rather slow. What could be the cause of this low speed ? (StringIO
? the fact that data is written byte after byte?)
What would you recommand to use in order to speed it the encoding ? What trick would you use to speed it up with Cython ?
import struct
import StringIO
def put_bit(f, b):
global buff, filled
buff = buff | (b << (7-filled))
if (filled == 7):
f.write(struct.pack('B',buff))
buff = 0
filled = 0
else:
filled += 1
def rice_code(f, x, k):
q = x / (1 << k)
for i in range(q):
put_bit(f, 1)
put_bit(f, 0)
for i in range(k-1, -1, -1):
put_bit(f, (x >> i) & 1)
def compress(L, k):
f = StringIO.StringIO()
global buff, filled
buff = 0
filled = 0
for x in L: # encode all numbers
rice_code(f, x, k)
for i in range(8-filled): # write the last byte (if necessary pad with 1111...)
put_bit(f, 1)
return f.getvalue()
if __name__ == '__main__':
print struct.pack('BBB', 0b00010010, 0b00111001, 0b01111111) #see http://fr.wikipedia.org/wiki/Codage_de_Rice#Exemples
print compress([1,2,3,10],k = 3)
PS : Should this question be moved to https://codereview.stackexchange.com/ ?