Splitting a byte list into a list of dicts

Question

I have some byte data (say for an image):

00 19 01 21 09 0f 01 15 .. FF

I parse it and store it as a byte list:

[b'\x00', b'\x19', b'\x01', b'\x21', b'\x09', b'\x0f', b'\x01', b'\x15', ...]

These are RGBA values (little endian, 2 bytes) that I need to parse as dict format as follows:

[{'red':0x0019, 'green':0x2101, 'blue':0x0f09, 'alpha':0x1501}, {'red':...},...]

Note: The image data terminates once we reach a 0xff. Values can be stored in hex or decimal, doesn't matter as long as it's consistent.

My attempt:

# our dict keys
keys = ['red', 'green', 'blue', 'alpha']

# first, grab all bytes until we hit 0xff
img = list(takewhile(lambda x: x != b'\xFF', bitstream))

# traverse img 2 bytes at a time and join them
rgba = []
for i,j in zip(img[0::2],img[1::2]):
  rgba.append(b''.join([j,i]) # j first since byteorder is 'little'

So far it will output [0x0019, 0x2101, 0x0f09, ...]

Now I'm stuck on how to create the list of dicts "pythonically". I can simply use a for loop and pop 4 items from the list at a time but that's not really using Python's features to their potential. Any advice?

Note: this is just an example, my keys can be anything (not related to images). Also overlook any issues with len(img) % len(keys) != 0.

chepner · Accepted Answer · 2015-05-11T17:30:26.253

First, use StringIO to create a file-like object from the bitstream to facilitate grabbing 8-byte chunks one at a time. Then, use struct.unpack to convert each 8-byte chunk into a tuple of 4 integers, which we zip with the tuple of keys to create a list that can be passed directly to dict. All this is wrapped in a list comprehension to create rgba in one pass.

(I also use functools.partial and itertools.imap to improve readabililty.)

import StringIO
import re
from itertools import imap
from functools import partial

keys = ("red", "green", "blue", "alpha")
# Create an object we can read from
str_iter = StringIO.StringIO(re.sub("\xff.*", "", bitstream))
# A callable which reads 8 bytes at a time from str_iter
read_8_bytes = partial(str_iter.read, 8)
# Convert an 8-byte string into a tuple of 4 integer values
unpack_rgba = partial(struct.unpack, "<HHHH")
# An iterable of 8-byte strings
chunk_iter = iter(read_8_bytes, '')
# Map unpack_rgba over the iterator to get an iterator of 4-tuples,
# then zip each 4-tuple with the key tuple to create the desired dict
rgba = [dict(zip(keys, rgba_values))
         for rgba_values in imap(unpack_rgba, chunk_iter)]

(If you getting the binary data with something like

with open('somefile', 'rb') as fh:
    bitstream = fh.read()

then you can use the file iterator in place of str_iter, so that you only read bytes from the file as you need them, rather than all at once.)

Just a tangential question: I'm going to be doing a lot of byte wrangling with Python and I want to be as equipped as I can, any resource suggestions or libraries I should be looking at? — Helen Che, May 11 '15 at 17:20
There might be libraries that implement the scheme outlined above, but you can go pretty far just by using what's in the standard library. The key here is to use the two-argument form of `item` to iterate over your file in fixed-size chunks suitable for unpacking with `struct`, then using the standard `itertools` techniques (or explicit for loops, if you don't like functional programming :)) to process the resulting stream. — chepner, May 11 '15 at 17:33

score 1 · Answer 2 · edited May 23 '17 at 12:14

Maybe instead of

rgba = []
for i,j in zip(img[0::2],img[1::2]):
  rgba.append(b''.join([j,i]) # j first since byteorder is 'little'

You can simplify it to

rgba = [b''.join([j,i]) for i,j in zip(img[0::2], img[1::2])]

Now you need to chunkify your list, so you can maybe borrow a recipe from this link, then get:

dict_list = [dict(zip(keys, chunk)) for chunk in chunks(rgba, 4)]

e.g.

>>> keys = ['red', 'green', 'blue', 'alpha']
>>> test  = [b'\x0019', b'\x2101', b'\x0f09', b'\x1501']
>>> dict(zip(keys, test))
{'blue': '\x0f09', 'alpha': '\x1501', 'green': '!01', 'red': '\x0019'}

martineau · Answer 3 · 2015-05-11T18:07:39.507

Without getting too fancy, you could do it very efficiently like this:

try:
    from itertools import izip
except ImportError:  # Python 3
    izip = zip

def grouper(n, iterable):
    "s -> (s0,s1,...sn-1), (sn,sn+1,...s2n-1), (s2n,s2n+1,...s3n-1), ..."
    return izip(*[iter(iterable)]*n)

img  = [b'\x00', b'\x19', b'\x01', b'\x21', b'\x09', b'\x0f', b'\x01', b'\x15',
        b'\x01', b'\x1a', b'\x02', b'\x22', b'\x0a', b'\x10', b'\x02', b'\x16',
        b'\xff']

keys = ['red', 'green', 'blue', 'alpha']
list_of_dicts = [dict(izip(keys, group))
                    for group in grouper(4, (j+i for i,j in grouper(2, img)))]

for value in list_of_dicts:
    print(value)

Splitting a byte list into a list of dicts

3 Answers3