9

I have a binary file containing a stream of 10-bit integers. I want to read it and store the values in a list.

It is working with the following code, which reads my_file and fills pixels with integer values:

file = open("my_file", "rb")

pixels = []
new10bitsByte = ""

try:
    byte = file.read(1)
    while byte:
        bits = bin(ord(byte))[2:].rjust(8, '0')
        for bit in reversed(bits):
            new10bitsByte += bit
            if len(new10bitsByte) == 10:
                pixels.append(int(new10bitsByte[::-1], 2))
                new10bitsByte = ""             
    byte = file.read(1)

finally:
    file.close()

It doesn't seem very elegant to read the bytes into bits, and read it back into "10-bit" bytes. Is there a better way to do it?

With 8 or 16 bit integers I could just use file.read(size) and convert the result to an int directly. But here, as each value is stored in 1.25 bytes, I would need something like file.read(1.25)...

jbgt
  • 1,586
  • 19
  • 24
  • 2
    Check out the first two answers here: http://stackoverflow.com/questions/10689748/how-i-can-read-a-bit-in-python – juanpa.arrivillaga Jul 11 '16 at 09:09
  • @juanpa.arrivillaga Thank you! So from what I understand there is no way to read a file 10 bit by 10 bit in Python, I have to read it byte by byte and then "cut" the bytes to get my "10-bit" bytes. – jbgt Jul 11 '16 at 09:27
  • From what I understand, yes, but I am not certain. I just found that answer and it looked like it might be useful. – juanpa.arrivillaga Jul 11 '16 at 09:31
  • @juanpa.arrivillaga Ok, thank you for your help! – jbgt Jul 11 '16 at 09:37
  • 2
    You may want to read 40 bits at a time, i.e. 5 bytes. Those contain 4 full 10 bit numbers, which you should be able to extract in one go. – MisterMiyagi Jul 11 '16 at 09:51
  • 1
    What MisterMiyagi said. It looks like you're using Python 2. Is that correct? Unless the input file is really huge, it's probably a little more efficient to read it all into memory, rather than reading it byte by byte. FWIW, `bits = format(ord(byte), '08b')` is a little more efficient than using the `bin` function. But really, it's better to use MisterMiyagi's suggestion instead of this roundabout conversion algorithm. – PM 2Ring Jul 11 '16 at 10:05
  • @PM2Ring No I'm using Python 3. The file is around 300 Mo so it shouldn't be an issue to read it all into memory. I'll try MisterMiyagi's solution! – jbgt Jul 11 '16 at 10:16
  • 300 megabytes is rather large, so it might be better to not read the whole thing at once, since Python data structures can chew up a fair bit of RAM. – PM 2Ring Jul 11 '16 at 11:49

3 Answers3

3

Here's a generator that does the bit operations without using text string conversions. Hopefully, it's a little more efficient. :)

To test it, I write all the numbers in range(1024) to a BytesIO stream, which behaves like a binary file.

from io import BytesIO

def tenbitread(f):
    ''' Generate 10 bit (unsigned) integers from a binary file '''
    while True:
        b = f.read(5)
        if len(b) == 0:
            break
        n = int.from_bytes(b, 'big')

        #Split n into 4 10 bit integers
        t = []
        for i in range(4):
            t.append(n & 0x3ff)
            n >>= 10
        yield from reversed(t)

# Make some test data: all the integers in range(1024),
# and save it to a byte stream
buff = BytesIO()

maxi = 1024
n = 0
for i in range(maxi):
    n = (n << 10) | i
    #Convert the 40 bit integer to 5 bytes & write them
    if i % 4 == 3:
        buff.write(n.to_bytes(5, 'big'))
        n = 0

# Rewind the stream so we can read from it
buff.seek(0)

# Read the data in 10 bit chunks
a = list(tenbitread(buff))

# Check it 
print(a == list(range(maxi)))    

output

True

Doing list(tenbitread(buff)) is the simplest way to turn the generator output into a list, but you can easily iterate over the values instead, eg

for v in tenbitread(buff):

or

for i, v in enumerate(tenbitread(buff)):

if you want indices as well as the data values.


Here's a little-endian version of the generator which gives the same results as your code.

def tenbitread(f):
    ''' Generate 10 bit (unsigned) integers from a binary file '''
    while True:
        b = f.read(5)
        if not len(b):
            break
        n = int.from_bytes(b, 'little')

        #Split n into 4 10 bit integers
        for i in range(4):
            yield n & 0x3ff
            n >>= 10

We can improve this version slightly by "un-rolling" that for loop, which lets us get rid of the final masking and shifting operations.

def tenbitread(f):
    ''' Generate 10 bit (unsigned) integers from a binary file '''
    while True:
        b = f.read(5)
        if not len(b):
            break
        n = int.from_bytes(b, 'little')

        #Split n into 4 10 bit integers
        yield n & 0x3ff
        n >>= 10
        yield n & 0x3ff
        n >>= 10
        yield n & 0x3ff
        n >>= 10
        yield n 

This should give a little more speed...

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • It's working perfectly with the little-endian version, and 7 times faster than my initial code :) Thank you very much! – jbgt Jul 11 '16 at 14:11
  • @Jean-BaptisteMartin: There's a slight optimization that can be made. I don't know if it will speed things up much, but it's worth trying. I'll add it to my answer shortly. – PM 2Ring Jul 11 '16 at 14:25
  • This is great! But how would you get _signed_ ints instead of unsigned? – Taaam Jul 18 '18 at 16:50
  • 1
    @Taaam That's not too hard. We can use the `^` bitwise exclusive-or operator for that. To get 10 bit signed numbers, change the `n & 0x3ff` to `((n & 0x3ff) ^ 512) - 512`. You can drop the inner parentheses: `(n & 0x3ff ^ 512) - 512`, but I think they make it a little easier to read. – PM 2Ring Jul 18 '18 at 19:40
2

Adding a Numpy based solution suitable for unpacking large 10-bit packed byte buffers like the ones you might receive from AVT and FLIR cameras.

This is a 10-bit version of @cyrilgaudefroy's answer to a similar question; there you can also find a Numba alternative capable of yielding an additional speed increase.

import numpy as np

def read_uint10(byte_buf):
    data = np.frombuffer(byte_buf, dtype=np.uint8)
    # 5 bytes contain 4 10-bit pixels (5x8 == 4x10)
    b1, b2, b3, b4, b5 = np.reshape(data, (data.shape[0]//5, 5)).astype(np.uint16).T
    o1 = (b1 << 2) + (b2 >> 6)
    o2 = ((b2 % 64) << 4) + (b3 >> 4)
    o3 = ((b3 % 16) << 6) + (b4 >> 2)
    o4 = ((b4 % 4) << 8) + b5

    unpacked =  np.reshape(np.concatenate((o1[:, None], o2[:, None], o3[:, None], o4[:, None]), axis=1),  4*o1.shape[0])
    return unpacked

Reshape can be omitted if returning a buffer instead of a Numpy array:

unpacked =  np.concatenate((o1[:, None], o2[:, None], o3[:, None], o4[:, None]), axis=1).tobytes()

Or if image dimensions are known it can be reshaped directly, e.g.:

unpacked =  np.reshape(np.concatenate((o1[:, None], o2[:, None], o3[:, None], o4[:, None]), axis=1), (1024, 1024))

If the use of the modulus operator appears confusing, try playing around with:

np.unpackbits(np.array([255%64], dtype=np.uint8))

Edit: It turns out that the Allied Vision Mako-U cameras employ a different ordering than the one I originally suggested above:

o1 = ((b2 % 4) << 8) + b1
o2 = ((b3 % 16) << 6) + (b2 >> 2)
o3 = ((b4 % 64) << 4) + (b3 >> 4)
o4 = (b5 << 2) + (b4 >> 6)

So you might have to test different orders if images come out looking wonky initially for your specific setup.

1

As there is no direct way to read a file x-bit by x-bit in Python, we have to read it byte by byte. Following MisterMiyagi and PM 2Ring's suggestions I modified my code to read the file by 5 byte chunks (i.e. 40 bits) and then split the resulting string into 4 10-bit numbers, instead of looping over the bits individually. It turned out to be twice as fast as my previous code.

file = open("my_file", "rb")

pixels = []
exit_loop = False

try:
    while not exit_loop:
        # Read 5 consecutive bytes into fiveBytesString
        fiveBytesString = ""
        for i in range(5):
            byte = file.read(1)
            if not byte:
                exit_loop = True
                break
            byteString = format(ord(byte), '08b')
            fiveBytesString += byteString[::-1]
        # Split fiveBytesString into 4 10-bit numbers, and add them to pixels
        pixels.extend([int(fiveBytesString[i:i+10][::-1], 2) for i in range(0, 40, 10) if len(fiveBytesString[i:i+10]) > 0])

finally:
    file.close()
jbgt
  • 1,586
  • 19
  • 24
  • 1). I'm not sure why you are doing those reversals with `[::-1]`. 2). You need to check that `fiveBytesString` isn't empty before attempting to convert it to integer. 3). `exit` isn't a great variable name because it shadows the `exit()` function. It's not an error to use it as a flag like that, just a little confusing for others reading your code. :) – PM 2Ring Jul 11 '16 at 12:31
  • 1) It is because I already know what my output is supposed to be (I'm trying to do the conversion myself but I already have the output file). For example, the 5 first bytes are 01001011, 01010100, 11100001, 10000101, 00011000. I know that the first output numbers should be 20, 23, 21, 37. To find the right output I had to reverse the bytes, concatenate them, split them and reverse the result again. I don't know how the input file was created, I just guessed that I had to do these reverses to get my output... 2) and 3) Edited, thanks! – jbgt Jul 11 '16 at 13:04
  • Ah, ok. I've added a new version. It now gives the same values as your code. However, I don't see how you get `[20, 23, 21, 37]` from `[0b01001011, 0b01010100, 0b11100001, 0b10000101, 0b00011000]`. – PM 2Ring Jul 11 '16 at 13:45
  • I just realized I gave you the wrong bytes, I'm really sorry! But your updated generator is working fine with my file and know I think I have a better understanding of how binary files manipulation work, thank you for your help! – jbgt Jul 11 '16 at 14:14
  • My pleasure! And thanks for the accept. If you'd posted the right bytes I would have been a bit faster with my answer. :) BTW, you may like to look at [this answer](http://stackoverflow.com/a/31700898/4014959) I wrote last year that takes a slightly different approach to bit fiddling. – PM 2Ring Jul 11 '16 at 14:20
  • Interesting indeed! Maybe not useful for what I'm doing with my file right now but interesting reading though! – jbgt Jul 11 '16 at 14:27