Extract first 20 bytes in a binary header

Question

I'm trying to learn how to do this in Python, play'd arround with the psuedo code bellow but couln't come up with anything worth a penny

with open(file, "rb") as f:
    byte = f.read(20) # read the first 20 bytes?
    while byte != "":
        print f.read(1)

In the end I'd like to end up with a code capable of the following: https://stackoverflow.com/a/2538034/2080223

But I'm ofcourse interested in learning how to get there so any pointers would be much apphriciated!

`bytes = f.read(20)` should be working. If it doesn't, you'll need to tell us about the results you are expecting. — lyschoening, May 13 '14 at 14:37
@JoelKalberg You shall summarize your question within your text, use links only as supporting one. As soon as the reader has to jump your link, you lost him. — Jan Vlcinsky, May 13 '14 at 14:38
While this may be a feasible approach to get your feet wet with the file format, you should look at the [`struct`](https://docs.python.org/2/library/struct.html) module to actually parse it. — Lukas Graf, May 13 '14 at 14:49

jedwards · Accepted Answer · 2014-05-13T14:46:29.637

Very close

with open(file, "rb") as f:
    byte = f.read(20) # read the first 20 bytes? *Yes*

will indeed read the first 20 bytes.

But

    while byte != "":
        print f.read(1) # print a single byte?

will (as you expect) read a single byte and print it, but it will print it forever, since your loop condition will always be true.

Its not clear what you want to do here, but if you just want to print a single byte, removing the while loop will do that:

print f.read(1)

If you want to print single bytes until the end of file, consider:

while True:
   byte = f.read(1)
   if byte == "": break
   print byte

Alternatively, if you're looking for specific bytes within the first 20 you read into byte, you can use iterable indexing:

with open(file, "rb") as f:
    byte = f.read(20)

print byte[0]  # First byte of the 20 bytes / first byte of the file
print byte[1]  # Second byte of the 20 bytes / ...
# ...

Or as Lucas suggests in the comments, you could iterate over the string byte (it's a string by the way, that's returned from read()):

with open(file, "rb") as f:
    byte = f.read(20)

for b in byte:
    print b

You may also be interested in the position of the byte, and it's hexidecimal value (for values like 0x0a, 0x0d, etc):

with open(file, "rb") as f:
    byte = f.read(20)

for i,b in enumerate(byte):
    print "%02d:  %02x"  % (i,b)

More importantly, after reading the first 20 bytes, `f.read(1)` will read byte 21, 22 etc. He should iterate over `bytes` instead of reading from `f` some more. — Lukas Graf, May 13 '14 at 14:41
@Lucas, I just had the same thought as you, I updated my answer (Alternatively...) — jedwards, May 13 '14 at 14:43
I @jedwards and thanks for your reply. Most informative :) I will play around with this for a while and see what I end up with. I appologize for a bad OP, I was a bit rushed at the time. J — , May 13 '14 at 14:53

Extract first 20 bytes in a binary header

1 Answers1