0

I want to write a hash function returning a hash from 3 integers a, b, c. I want to be able to choose the number of bits with which each integer is encoded and concatenate them. For instance:

a=60  (8 bits) -> 00111100
b=113 (8 bits) -> 01110001
c=5   (4 bits) -> 0101

should give

00111100011100010101

i.e. 20 bits.

Given a, b and c as integers (60, 113 and 5) and the number of bits allowed for each (8, 8 and 4), how can I get the hash, store it in a python object of the total size (20 bits), and write/load it to a file?

martineau
  • 119,623
  • 25
  • 170
  • 301
jul
  • 36,404
  • 64
  • 191
  • 318
  • 1
    Anything you store to a file must be a multiple of 8 bits. If it isn't, you need to wait until you collect more bits, or pad it with some dummy data. – Mark Ransom Apr 24 '15 at 15:05
  • You can use [this answer](http://stackoverflow.com/a/10691412/355230) to a related question to read and write bits to a file. – martineau Apr 24 '15 at 15:29

2 Answers2

0

Here's a class that will write an arbitrary number of bits to a file-like object. Call flush when done.

class bitwriter():
    def __init__(self, f):
        self.f = f
        self.bits = 0
        self.count = 0
    def write(self, value, bitcount):
        mask = (1 << bitcount) - 1
        self.bits = (self.bits << bitcount) | (value & mask)
        self.count += bitcount
        while self.count >= 8:
            byte = self.bits >> (self.count - 8)
            self.f.write(byte, 1)
            self.count -= 8
            self.bits &= (1 << self.count ) - 1
    def flush(self):
        if self.count != 0:
            byte = self.bits << (8 - count)
            self.f.write(byte, 1)
        self.bits = self.count = 0
        self.f.flush()
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
0

I think this does what you want. It uses the referenced bitio module from another answer of mine to write/read the bits to/from a file.

Operating systems generally require files to be a multiple of 8 bits in size, so this would end up creating a 24-bit (3 byte) file to store a single 20-bit value. This 16.7% of overhead per 20-bit value wouldn't occur, of course, if you wrote several of them, one immediately after the another, and didn't call flush() until after the last.

import bitio  # see https://stackoverflow.com/a/10691412/355230

# hash function configuration
BW = 8, 8, 4  # bit widths of each integer
HW = sum(BW)  # total bit width of hash

def myhash(a, b, c):
    return (((((a & (2**BW[0]-1)) << BW[1]) |
                b & (2**BW[1]-1)) << BW[2]) |
                c & (2**BW[2]-1))

hashed = myhash(60, 113, 5)
print('{:0{}b}'.format(hashed, HW))  # --> 00111100011100010101

with open('test.bits', 'wb') as outf:
    bw = bitio.BitWriter(outf)
    bw.writebits(hashed, HW)
    bw.flush()

with open('test.bits', 'rb') as inf:
    br = bitio.BitReader(inf)
    val = br.readbits(HW)

print('{:0{}b}'.format(val, HW))  # --> 00111100011100010101
martineau
  • 119,623
  • 25
  • 170
  • 301
  • Thanks, that's useful. The file written is 3 bytes, the smallest number of bytes that can contain my 20 bits, which is what I need. Now myhash and readbits in your example return an int (24 bytes) which is huge for a 20bits object Is it possible to only allocate 3 bytes for this object, so that I can save a lot of memory? I asked another question here: http://stackoverflow.com/q/29894071/326849. – jul Apr 27 '15 at 11:12
  • 1
    It's possible to only allocate 20 bits (2½ bytes) for these values. `;-)` I've posted a way to do it as an [answer](http://stackoverflow.com/a/29907689/355230) to your other question. – martineau Apr 29 '15 at 17:04