10

I want to read the raw binary of a file and put it into a string. Currently I am opening a file with the "rb" flag and printing the byte but it's coming up as ASCII characters (for text that is, for video and audio files it's giving symbols and gibberish). I'd like to get the raw 0's and 1's if possible. This needs to work for audio and video files as well so simply converting the ascii to binary isn't an option.

with open(filePath, "rb") as file:
    byte = file.read(1)
    print byte
Jitin
  • 346
  • 3
  • 13
user2803250
  • 111
  • 1
  • 1
  • 4
  • 1
    possible duplicate of http://stackoverflow.com/questions/1035340/reading-binary-file-in-python – Chris Noreikis Nov 15 '13 at 15:44
  • not really. he's asking more here than the other post can answer. even though it may seem weird what he's asking... – Alexander Tobias Bockstaller Nov 15 '13 at 16:03
  • 1
    http://stackoverflow.com/questions/4775146/getting-raw-binary-representation-of-a-file-in-python – Navneet Nov 15 '13 at 16:05
  • 2
    You _are_ reading the binary 0's and 1's from the file into a one character string. Try `print bin(ord(byte))`. The `ord()` function returns the integer value of the byte when the argument is a one character 8-bit string. Lastly The `bin()` function convert integer numbers to a binary string of 0 and 1 characters for printing with a `0b` prefix so you'll see something like `0b1100001` printed. – martineau Nov 15 '13 at 16:59

2 Answers2

10

What you are reading IS really the "raw binary" content of your "binary" file. Strange as it might seems, binary data are not "0's and 1's" but binary words (aka bytes, cf http://en.wikipedia.org/wiki/Byte) which have an integer (base 10) value and can be interpreted as ascii chars. Or as integers (which is how one usually do binary operations). Or as hexadecimal. For what it's worth, "text" is actually "raw binary data" too.

To get a "binary" representation you can have a look here : Convert binary to ASCII and vice versa but that's not going to give you more "raw binary data" than what you actually have...

Now the question: why do you want these data as "0's and 1's" exactly ?

Community
  • 1
  • 1
bruno desthuilliers
  • 75,974
  • 6
  • 88
  • 118
  • 1
    to be crystal clear: [`raw_binary_data = open(filename, "rb").read()`](http://stackoverflow.com/q/33145337/4279). It is unrelated to "01"-strings that contain ASCII characters '0', '1' representing the data in binary numeral system ([base-2 system is a positional notation with a radix of 2](https://en.wikipedia.org/wiki/Binary_number)): `b'\x0d'[0] == 0x0d == 13 == 0b1101 == int('1101', 2)` (`b'\x0d'[0]` is Python 3 expression, use `ord('\x0d')` on Python 2) but `b'\x0d' != b'1101'` (`len(b'\x0d') == 1` and `len(b'1101') == 4`), `b'1101' == b'\x31\x31\x30\x31'` – jfs Jan 15 '17 at 17:31
9

to get the binary representation I think you will need to import binascii, then:

byte = f.read(1)
binary_string = bin(int(binascii.hexlify(byte), 16))[2:].zfill(8)

or, broken down:

import binascii


filePath = "mysong.mp3"
file = open(filePath, "rb")
with file:
    byte = file.read(1)
    hexadecimal = binascii.hexlify(byte)
    decimal = int(hexadecimal, 16)
    binary = bin(decimal)[2:].zfill(8)
    print("hex: %s, decimal: %s, binary: %s" % (hexadecimal, decimal, binary))

will output:

hex: 64, decimal: 100, binary: 01100100
Holy Mackerel
  • 3,259
  • 1
  • 25
  • 41
  • Note to the OP : please understand the difference between "raw data" and "binary **representation**". – bruno desthuilliers Nov 15 '13 at 16:09
  • 1
    binascii is not needed here. when working with 1 byte we can use ord() to get an integer ordinal and then convert it with hex() or bin(). But for multibyte values binascii.hexlify() can be handy as it will convert the whole byte string at once. – wombatonfire Apr 19 '16 at 19:38