How can i convert binary data from a file to readable base two binary in python?

Question

In a class i am in, we are assigned to a simple mips simulator. The instructions that my program is supposed to process are given in a binary file. I have no idea how to get anything usable out of that file. Here is my code:

import struct
import argparse

'''open a parser to get command line arguments '''
parser = argparse.ArgumentParser(description='Mips instruction simulator')

'''add two required arguments for the input file and the output file'''
parser.add_argument('-i', action="store", dest='infile_name', help="-i INPUT_FILE", required=True)
parser.add_argument('-o', action="store", dest='outfile_name', help="-o OUTPUT_FILE_NAME", required=True)

'''get the passed arguments'''
args = parser.parse_args()


class Disassembler:
    '''Disassembler for mips code'''
    instruction_buffer = None
    instructions_read = 0

    def __init__(self, filename):
        bin_file = None
        try:
            bin_file = open(filename, 'rb')
        except:
            print("Input file: ", filename, " could not be opened. Check the name, permissions, or path")
            quit()

        while True:
            read_bytes = bin_file.read(4)
            if (read_bytes == b''):
                break
            int_var = struct.unpack('>I', read_bytes)
            print(int_var)

        bin_file.close()


disembler = Disassembler(args.infile_name)

So, at first i just printed the 4 bytes i read to see what was returned. I was hoping to see plain bits(just 1's and 0's). What i got was byte strings from what I've read. So i tried googling to find out what i can do about this. So i found i could use struct to convert these byte strings to integers. That outputs them in a format like (4294967295,).

This is still annoying, because i have to trim that to make it a usable integer then even still i have to convert it to bits(base 2). It's nice that i can read the bytes with struct as either signed or unsigned, because half of the input file's input are signed 32 bit numbers.

All of this seems way more complicated than it should be to just get the bits out of a binary file. Is there an easier way to do this? Also can you explain it like you would to someone who is not incredibly familiar with esoteric python code and is new to binary data?

My overall goal is to get straight 32 bits out of each 4 bytes i've read. The beginning of the file is a list of mips opcodes. So i need to be able to see specific parts of these numbers, like the first 5 bits, then the next 6, or so on. The end of the file contains 32 bit signed integer values. The two halves of the files are separated by a break opcode.

Thank you for any help you can give me. It's driving me crazy that i can't find any straight forward answers through searching. If you want to see the binary file, tell me where and i'll post it.

Do you just want to [visualise](https://stackoverflow.com/questions/18111488/convert-integer-to-binary-in-python-and-compare-the-bits) the bits of a value, or do you want them in [integer](https://stackoverflow.com/questions/30971079/how-do-i-convert-an-integer-to-a-list-of-bits-in-python) form for further manipulation? — Reti43, Apr 02 '16 at 09:28
I need to read them as bits i think. Because i need specific lengths out of each 32 bit value. Like i need to see what the first 6 bits are, then the next 5, etc. So I need the values in straight bits i think. Unless there is a better way to go about that, but to my knowledge, this is the only way i know how to work with the data. — Jacob, Apr 02 '16 at 09:32
There are a few ways to do that. Look at the links above in my comment. I'd suggest you also mention in the question exactly what your goal is, otherwise we can't suggest better approaches that the solution you're trying to implement in the question. — Reti43, Apr 02 '16 at 09:34

PM 2Ring · Accepted Answer · 2016-04-02T10:38:20.277

1

Bear in mind that normal Python integers don't have a fixed bit width: they're as big as they need to be. This can be annoying when you want to convert signed integers to bit strings. I recommend that you stick with what you're currently doing: converting blocks of 4 bytes to unsigned integer using

n = struct.unpack('>I', read_bytes)[0]

and then using either format(n, '032b') or '{0:032b}'.format(n) to convert that to a bit string if you want to print the bits.

To access or modify the bits in an integer, you shouldn't be bothering with string conversion, instead you should use Python's bitwise operators, &, |, ^, <<, >>, ~. Eg, (n >> 7) & 1 gives you bit 7 of n.

Please see Unary arithmetic and bitwise operations and the following sections in the Python docs for detailed information about these operators.

edited Apr 02 '16 at 10:38

answered Apr 02 '16 at 09:52

PM 2Ring

54,345
6
82
182

So what is n there? What i get back from struct is this, "(2370044120,)" for a number. Can i just pass that straight to format? Also, how do python's bitwise operators work? >> is a shift right? What does the & 1 do? Can the bitwise operators work on what's returned from struct? – Jacob Apr 02 '16 at 10:04
Sorry, @Jacob, I forgot that `struct.unpack` always returns a tuple. I've fixed my code so that `n` is now a Python integer. And it's guaranteed to be an unsigned 32 bit number by the `'>I'` format specification. – PM 2Ring Apr 02 '16 at 10:35
@Jacob: Yes, `>>` is a right shift, `&` is bitwise AND, so `c = a & b` performs the AND operation on the corresponding bits of integers `a` and `b`, putting the results into the corresponding bits of `c`. Therefore, `n & 1` evaluates to the least significant bit of `n`. The Python bitwise operators work very much like they do in C and many other languages (apart from the fact that Python integers do not have a fixed bit width); I figured you'd be familiar with bitwise operators if you're making a disassembler. :) – PM 2Ring Apr 02 '16 at 10:42
Thanks for explaining all of that. I realized later i should have just looked up what struct.unpack returned and i saw it was a tuple. I am familiar with as far as I've seen them used in class and in other places. So i know what each ones does basically. I've never used them in a program yet. But, this is a good chance to get familiar with doing bitwise arithmetic. – Jacob Apr 02 '16 at 14:38
@Jacob: Doing bitwise stuff makes you feel like a Real Programmer™. :) If my answer has helped you, please consider [accepting](http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work) it. – PM 2Ring Apr 02 '16 at 14:55

Bharel · Answer 2 · 2016-04-02T09:52:03.393

0

This way you can access each individual bit in the file.

"".join(format(i, "08b") for i in byte_string)

For example:

>>> "".join(format(i, "08b") for i in b"\x23\x54a")
'001000110101010001100001'

edited Apr 02 '16 at 09:52

answered Apr 02 '16 at 09:42

Bharel

23,672
5
40
80

`bin` doesn't give you any control over the length of the resulting string, and it prepends `'0b'` which you generally want to slice off, as your code does. The `format` function or method is superior on both counts. Eg, `format(n, '032b')` – PM 2Ring Apr 02 '16 at 09:44
I've seen this posted in other questions. The only reason I'm hesitant to use it, is i don't know how it works. So join adds everything in it to the empty string. bin(i) converts a number to binary, so then does [2:] get rid of the 0b prefix? i feel like that's straight foward enough, but why is the for i in b"\x23\x53a" next to it? What does putting a for loop next to the bin function do? what is the loop looping through? Each byte? – Jacob Apr 02 '16 at 09:51
@PM2Ring Thanks. I have changed it to `08b` as each byte is 8 bits, you no need to convert 4 bytes to int every time. – Bharel Apr 02 '16 at 09:51
@Jacob `bin()` will trim the bytes as @PM2Ring said. I have updated my answer accordingly. What you see is a generator expression. You loop over the bytes and for each byte you format it. You then join everything using `str.join()` – Bharel Apr 02 '16 at 09:54
@Jacob: You can read about the [`str.join`](https://docs.python.org/3/library/stdtypes.html#str.join) method in the Python docs. The string is used as the _separator_ between the substrings in the list (or other iterator) argument of `.join`. – PM 2Ring Apr 02 '16 at 10:00
@Jacob: To understand what's going on in Bharel's code you need to understand the [List comprehension](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions) and the [Generator Expression](https://docs.python.org/3/reference/expressions.html#generator-expressions). These may be a bit bewildering at first, but once you get used to them you'll come to love them. They are used extensively in Python. – PM 2Ring Apr 02 '16 at 10:00
@Bharel: BTW, it's actually more efficient to pass `.join` a list than a generator. `.join` needs to scan the substrings you pass it _twice_: the 1st time to calculate the total string length, and the 2nd time to perform the actual joining. So if you pass `.join` a generator it needs to run the generator and gather the results into a list before it can start joining. See [this answer](http://stackoverflow.com/a/9061024/4014959) by Python core developer Raymond Hettinger for more info. – PM 2Ring Apr 02 '16 at 10:05

How can i convert binary data from a file to readable base two binary in python?

2 Answers2