35

I'm trying to read a file byte by byte, but I'm not sure how to do that. I'm trying to do it like that:

file = open(filename, 'rb')
while 1:
   byte = file.read(8)
   # Do something...

So does that make the variable byte to contain 8 next bits at the beginning of every loop? It doesn't matter what those bytes really are. The only thing that matters is that I need to read a file in 8-bit stacks.

EDIT:

Also I collect those bytes in a list and I would like to print them so that they don't print out as ASCII characters, but as raw bytes i.e. when I print that bytelist it gives the result as

['10010101', '00011100', .... ]
skaffman
  • 398,947
  • 96
  • 818
  • 769
zaplec
  • 1,681
  • 4
  • 23
  • 51
  • 10
    Use `while True:` instead of `while 1:`. – David Z May 20 '10 at 09:17
  • This question is very similar to http://stackoverflow.com/questions/1035340/reading-binary-file-in-python. – Randall Cook Feb 01 '13 at 07:12
  • @DavidZ those seem equivalent to me. So why? – Lorraine Aug 14 '19 at 10:17
  • @Wilson It is generally better to use purely truthy or falsy values. Even though `while 1` and `while True` achieve the same result, `while True` is a lot more descriptive and readable. – zaplec May 05 '20 at 16:19
  • 1
    I've closed this because it's asking two separate questions. The second question was edited into the the question after there was an answer to the first question. That edit really should have been rolled back, as it effectively invalidated the original two answers. It would have been good as a separate question. The OP also demonstrated that this is two separate questions by accepting an answer which only addresses the edited-in second question, which, along with collecting other answers that only address parts of the multiple questions, makes this question a confusing mess. – Makyen Jan 31 '23 at 12:50
  • @Makyen Times were different 13 years ago. I guess people will find what they are looking for here anyway – zaplec Feb 01 '23 at 08:14

5 Answers5

42

To read one byte:

file.read(1)

8 bits is one byte.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
19

The code you've shown will read 8 bytes. You could use

with open(filename, 'rb') as f:
   while 1:
      byte_s = f.read(1)
      if not byte_s:
         break
      byte = byte_s[0]
      ...
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
19

To answer the second part of your question, to convert to binary you can use a format string and the ord function:

>>> byte = 'a'
>>> '{0:08b}'.format(ord(byte))
'01100001'

Note that the format pads with the right number of leading zeros, which seems to be your requirement. This method needs Python 2.6 or later.

Scott Griffiths
  • 21,438
  • 8
  • 55
  • 85
2

There's a python module especially made for reading and writing to and from binary encoded data called 'struct'. Since versions of Python under 2.6 doesn't support str.format, a custom method needs to be used to create binary formatted strings.

import struct

# binary string
def bstr(n): # n in range 0-255
    return ''.join([str(n >> x & 1) for x in (7,6,5,4,3,2,1,0)])

# read file into an array of binary formatted strings.
def read_binary(path):
    f = open(path,'rb')
    binlist = []
    while True:
        bin = struct.unpack('B',f.read(1))[0] # B stands for unsigned char (8 bits)
        if not bin:
            break
        strBin = bstr(bin)
        binlist.append(strBin)
    return binlist
  • 2
    If you're just using it for a single character, surely you'd do better to just use `ord(f.read(1))` instead of `struct.unpack('B', f.read(1))[0]`? (You'd need to make it something like `c = f.read(1); if not c: break; binlist.append(bstr(ord(c)))`.) – Chris Morgan Dec 13 '11 at 09:52
  • I've got this error: ---> 12 bin = struct.unpack('B',f.read(1))[0] # B stands for unsigned char (8 bits) error: unpack requires a buffer of 1 bytes – MGM Oct 31 '18 at 12:14
1

Late to the party, but this may help anyone looking for a quick solution:

you can use bin(ord('b')).replace('b', '')bin() it gives you the binary representation with a 'b' after the last bit, you have to remove it. Also ord() gives you the ASCII number to the char or 8-bit/1 Byte coded character.

Cheers

e-nouri
  • 2,576
  • 1
  • 21
  • 36