0

I have a file filled with binary data representing a sequence of 2 byte instructions in big endian ordering.

I need to be able to decode these instructions into their more meaningful equivalents, but I'm having trouble getting the data into a format I can work with.

I think it would be best If I turned the instructions into actual strings of 0's and 1's.

So far, I've written this:

 def slurpInstructions(filename):
  instructions = []
  with open(filename, 'rb') as f:
    while True:
      try:
        chunk = f.read(1)
        print(struct.unpack('c', chunk))
      except: 
        break

which prints out the bytes 1 at a time, like this:

(b'\x00',)
(b'a',)

I know the first instruction in the file is:

0000000001100001

So, it looks like it's printing out the ascii chars corresponding to the integer values of each byte, except it's just printing out the hex representation when there's no ascii char for the int value.

Where do I go from here though? I need to turn my b'a' into '1100001' because I actually care about the bits, not the bytes.

Luke
  • 5,567
  • 4
  • 37
  • 66

2 Answers2

4

You could convert b'a' to its corresponding integer ord value, and then print the int in binary format using '{:b}'.format:

In [6]: '{:b}'.format(ord(b'a'))
Out[6]: '1100001'

  • Reading a large file one-byte-at-a-time can be very slow. You'll get better performance by reading more bytes per call to f.read. You can iterate over the contents of the file in chunks of 1024 bytes using:

    with open(filename, 'rb') as f:
        for chunk in iter(lambda: f.read(1024), b''):
    
  • Similarly, calling print once for each byte can be very slow. You'll get better performance by printing more bytes per call to print. So you could use a list comprehension to loop over the bytes in chunk, convert each to its string-binary format and then use ''.join to join the strings together:

    print(''.join(['{:b}'.format(ord(c)) for c in chunk]), end='')
    
  • Use bare except is considered a bad practice. If you choose to use try..except here, list only those Exceptions you wish to handle:

    try:
        ...
    except IOError:          
    

def slurpInstructions(filename):
    with open(filename, 'rb') as f:
        for chunk in iter(lambda: f.read(1024), b''):
            print(''.join(['{:b}'.format(c) for c in chunk]), end='')
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
2

In Python 3, to convert 2 bytes into a bitstring ('{:b}'.format() may be slightly slower):

>>> bin(int.from_bytes(b'\x00a', 'big'))[2:].zfill(16)
'0000000001100001'

For a single-source Python 2/3 compatible version, see Convert binary to ASCII and vice versa

To load all instructions both time- and space-efficiently, you could use array module:

#!/usr/bin/env python
import os
import sys
from array import array

instructions = array('H') # each instruction is >=2 bytes   
n = os.path.getsize(filename) // instructions.itemsize # number of instructions
with open(filename, 'rb') as file:
    instructions.fromfile(file, n) # slurp file
if sys.byteorder == 'little':
    instructions.byteswap() # force big-endian order

for h in instructions: # print as bitstrings
    print('{:016b}'.format(h))

For other ways to read a binary file efficiently, see Reading binary file in Python and looping over each byte.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670