5

I'd like to get the exact sequence of bits from a file into a string using Python 3. There are several questions on this topic which come close, but don't quite answer it. So far, I have this:

>>> data = open('file.bin', 'rb').read()
>>> data
'\xa1\xa7\xda4\x86G\xa0!e\xab7M\xce\xd4\xf9\x0e\x99\xce\xe94Y3\x1d\xb7\xa3d\xf9\x92\xd9\xa8\xca\x05\x0f$\xb3\xcd*\xbfT\xbb\x8d\x801\xfanX\x1e\xb4^\xa7l\xe3=\xaf\x89\x86\xaf\x0e8\xeeL\xcd|*5\xf16\xe4\xf6a\xf5\xc4\xf5\xb0\xfc;\xf3\xb5\xb3/\x9a5\xee+\xc5^\xf5\xfe\xaf]\xf7.X\x81\xf3\x14\xe9\x9fK\xf6d\xefK\x8e\xff\x00\x9a>\xe7\xea\xc8\x1b\xc1\x8c\xff\x00D>\xb8\xff\x00\x9c9...'

>>> bin(data[:][0])
'0b11111111'

OK, I can get a base-2 number, but I don't understand why data[:][x], and I still have the leading 0b. It would also seem that I have to loop through the whole string and do some casting and parsing to get the correct output. Is there a simpler way to just get the sequence of 01's without looping, parsing, and concatenating strings?

Thanks in advance!

maximus
  • 2,417
  • 5
  • 40
  • 56
  • 3
    reading a file opened in binary mode produces bytes object, not string object. Are you sure you're using py3k? – SilentGhost Jan 23 '11 at 18:03
  • Yes, I'm sure I'm using py3k. They probably are byte objects, but the terminal is displaying them with single quotes. – maximus Jan 23 '11 at 18:09
  • 1
    Single or double quotes are not relevant, but the representation of bytes objects start with a b. Like so `b'\xa1\xa7\xda4\x86G...'`, which you missed above. – Lennart Regebro Jan 23 '11 at 19:34
  • Ah, I see. I must've copy/pasted wrong. Ooops. – maximus Jan 25 '11 at 14:37
  • related: [Convert Binary to ASCII and vice versa (Python)](http://stackoverflow.com/q/7396849/4279) – jfs Nov 16 '13 at 05:51

4 Answers4

6

I would first precompute the string representation for all values 0..255

bytetable = [("00000000"+bin(x)[2:])[-8:] for x in range(256)]

or, if you prefer bits in LSB to MSB order

bytetable = [("00000000"+bin(x)[2:])[-1:-9:-1] for x in range(256)]

then the whole file in binary can be obtained with

binrep = "".join(bytetable[x] for x in open("file", "rb").read())
6502
  • 112,025
  • 15
  • 165
  • 265
  • 2
    Nice solution, but some remarks: 1. Python 3 does not have `xrange()` (and this is a Python 3 quesition). 2. You arrange the bits in some kind of "big endian" order, which is very unnatural to me. At least it should be pointed out. 3. It is generally considered an error to have a variable with the same name as a built-in class (`bytes`). – Sven Marnach Jan 23 '11 at 17:44
  • Now I like it even more, +1 :) – Sven Marnach Jan 23 '11 at 18:16
  • Thanks for the reply, but...I'm getting the error: TypeError: ord() expected string of length 1, but int found – maximus Jan 23 '11 at 18:26
  • 1
    Another point, ord(x) won't work with a bytes object (as a file in binary mode is read). Iterating over bytes produces a series of integers, so replace `[ord(x)]` with `[x]`. – Thomas K Jan 23 '11 at 18:26
  • @Thomas K: Thanks, fixed and also took the time to actually test it – 6502 Jan 23 '11 at 18:33
3

If you are OK using an external module, this uses bitstring:

>>> import bitstring
>>> bitstring.BitArray(filename='file.bin').bin
'110000101010000111000010101001111100...'

and that's it. It just makes the binary string representation of the whole file.

Scott Griffiths
  • 21,438
  • 8
  • 55
  • 85
2

It is not quite clear what the sequence of bits is meant to be. I think it would be most natural to start at byte 0 with bit 0, but it actually depends on what you want.

So here is some code to access the sequence of bits starting with bit 0 in byte 0:

def bits_from_char(c):
    i = ord(c)
    for dummy in range(8):
        yield i & 1
        i >>= 1

def bits_from_data(data):
    for c in data:
        for bit in bits_from_char(c):
            yield bit

for bit in bits_from_data(data):
    #  process bit

(Another note: you would not need data[:][0] in your code. Simply data[0] would do the trick, but without copying the whole string first.)

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
1

To convert raw binary data such as b'\xa1\xa7\xda4\x86' into a bitstring that represents the data as a number in binary system (base-2) in Python 3:

>>> data = open('file.bin', 'rb').read()
>>> bin(int.from_bytes(data, 'big'))[2:]
'1010000110100111110110100011010010000110...'

See Convert binary to ASCII and vice versa.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670