Python Reading From and Writing to Binary Files

Question

The following is my question re-worded

Reading the first 10 bytes of a binary file (operations later) -

infile = open('infile.jpg', 'rb')
outfile = open('outfile.jpg', 'wb')
x = infile.read(10)
for i in x:
    print(i, end=', ')
print(x)
outfile.write(bytes(x, "UTF-8"))

The first print statement gives -

255, 216, 255, 224, 0, 16, 74, 70, 73, 70,

The second print statement gives -

b'\xff\xd8\xff\xe0\x00\x10JFIF'

a hexadecimal interpretation of the values in x.

outfile.write(bytes(x, "UTF-8"))

returns -

TypeError: encoding or errors without a string argument

Then x must not be a normal string but rather a byte string, which is still iterable?

If I want to write the contents of x to outfile.jpg unaltered then I go -

outfile.write(x)

Now I try to take each x [i] and perform some operation on each (shown below as a bone simple product of 1), assign the values to y and write y to outfile.jpg such that it is identical to infile.jpg. So I try -

infile = open('infile.jpg', 'rb')
outfile = open('outfile.jpg', 'wb')
x = infile.read(10)

yi = len(x)
y = [0 for i in range(yi)]

j = 0
for i in x:
    y [j] = i*1
    j += 1

for i in x:
    print(i, end=', ')

print(x)

for i in y:
    print(i, end=', ')

print(y)

print(repr(x))
print(repr(y))

outfile.write(y)

The first print statement (iterating through x) gives -

255, 216, 255, 224, 0, 16, 74, 70, 73, 70,

The second print statement gives -

b'\xff\xd8\xff\xe0\x00\x10JFIF'

The third print statement (iterating through y) gives -

255, 216, 255, 224, 0, 16, 74, 70, 73, 70,

The print statement gives -

[255, 216, 255, 224, 0, 16, 74, 70, 73, 70]

And finally, printing repr(x) and repr(y), as suggested by Tim, gives, respectively -

b'\xff\xd8\xff\xe0\x00\x10JFIF'
[255, 216, 255, 224, 0, 16, 74, 70, 73, 70]

And the file write statement gives the error -

TypeError: 'list' does not support the buffer interface

What I need is y to be the same type as x such that outfile.write(x) = outfile.write(y)

I stare into the eyes of the Python, but still I do not see its soul.

Take a look at this post: http://stackoverflow.com/questions/5471158/typeerror-str-does-not-support-the-buffer-interface seems that the String class changed between Python 2 and Python 3. — Hunter McMillen, Dec 18 '13 at 21:15
Hunter - I replaced outfile.write(s) with outfile.write(s.encode('UTF-8') and received no errors! However using infile.read() resulted in outfile.jpg being twice the size as infile.jpg and broken. What I am trying to accomplish is read a binary file, perform an operation, reverse that operation and write the ouput to a separate file such that they are identical. — brett, Dec 18 '13 at 21:31
The answer in the post I linked used `outfile.write(bytes(s, "UTF-8"));` — Hunter McMillen, Dec 18 '13 at 21:40

score 3 · Answer 1 · answered Dec 18 '13 at 22:14

They're not identical at all - they just display identically after str() is applied to them (which print() does implicitly). Print the repr() of them and you'll see the difference. Example:

>>> x = b'ab'
>>> y = "b'ab'"
>>> print(x)
b'ab'
>>> print(y) # displays identically
b'ab'
>>> print(repr(x)) # but x is really a 2-byte bytes object
b'ab'
>>> print(repr(y)) # and y is really a 5-character string
"b'ab'"

Mixing strings and bytes objects doesn't make sense (well, not in the absence of an explicit encoding - but you're not trying to encode/decode anything here, right?). If you're working with binary files, then you shouldn't be using strings at all - you should be using bytes or bytearray objects.

So the problem isn't really in how you're writing: the logic is fundamentally confused before then.

Can't guess what you want. Please edit the question to show a complete, executable example of what you're trying to accomplish. We don't need JPG files for this - make up some short, arbitrary binary data. Like:

dummy_jpg = b'\x01\x02\xff'

Wow, repr() shows there is a difference. I will have to rethink the logic of what I am try to do. — brett, Dec 18 '13 at 22:42

brett · Answer 2 · 2014-02-19T12:46:20.750

... and this is how you you read and write to files in Python in binary mode.

#open binary files infile and outfile
infile = open('infile.jpg', 'rb')
outfile = open('outfile.jpg', 'wb')

#n = bytes to read
n=5

#read bytes of infile to x
x = infile.read(n)

#print x type, x
print()
print('x = ', repr(x), type(x))
print()

x = b'\xff\xd8\xff\xe0\x00' class 'bytes'

#define y of type list, lenth xi, type list
xi = len(x)
y = [0 for i in range(xi)]

#print y type, y
print('y =', repr(y), type(y))
print()

y = [0, 0, 0, 0, 0] class 'list'

#convert x to 8 bit octals and place in y, type list
j=0
for i in x:
    y [j] = '{:08b}' .format(ord(i))
    j += 1

#print y type, and y
print('y =', repr(y), type(y))
print()

y = ['11111111', '11011000', '11111111', '11100000', '00000000'] class 'list'

#perform bit level operations on y [i], not done in this example.

#convert y [i] back to integer
j=0
for i in y:
    y [j] = int(i, 2)
    j += 1

#print y type, and y
print('y =', repr(y), type(y))
print()

y = [255, 216, 255, 224, 0] class 'list'

#convert y to type byte and place in z
z = bytearray(y)

#print z type, and z
print('z =', repr(z), type(z))
print()

z = bytearray(b'\xff\xd8\xff\xe0\x00') class 'bytearray'

#output z to outfile
outfile.write(z)

infile.close()
outfile.close()
outfile = open('outfile.jpg', 'rb')

#read bytes of outfile to x
x = outfile.read(n)

#print x type, and x
print('x =', repr(x), type(x))
print()

x = b'\xff\xd8\xff\xe0\x00' class 'bytes'

#conclusion:  first n bytes of infile = n bytes of outfile (without bit level operations)

outfile.close()

score 0 · Answer 3 · answered Dec 21 '13 at 19:23

Thanks for clarifying! What you want is very easy, but you really need to read the docs for the bytes and bytearray types. What you DO NOT want is anything having to do with:

Unicode
strings
encoding
decoding

Those are all utterly irrelevant here. You have binary data from start to finish, and need to stick to bytes and/or bytearray objects. Both are sequences of bytes ("little integers" in range(256)); bytes is an immutable sequence, and bytearray is a mutable sequence.

Then x must not be a normal string but rather a byte string, which is still iterable?

Read the docs ;-) x is not "a string"; do this to see its type:

print(type(x))

That will display:

<class 'bytes'>

It's a bytes object, as briefly explained already. It's a sequence, so yes, it's iterable, like all sequences. You can also index into it, slice it, etc.

Your y is a list. Alas, I can't figure out what you're trying to accomplish with it.

What I need is y to be the same type as x such that outfile.write(x) = outfile.write(y)

No, you do not need x and y to be the same type. You do want to be able to write y as binary data. For that you need to create a bytes or bytearray object. That's very easy; just do one of these:

 y = bytes(y)

or

 y = bytearray(y)

Then

outfile.write(y)

will do what you want.

Although, as above, I have no idea why you created a list here to begin with. A much easier way to create the same list would have been to skip all the loops and just write:

 y = list(x)

If I'm getting through, you should be starting to suspect that your mental model of what's going on here is too complicated, not too simple. You're imagining difficulties that don't really exist :-) Reading from a binary file gives you a bytes object (or see the file .readinto() method if you want reading a binary file to fill a bytearray object instead), while writing to a binary file requires giving it a bytes or bytearray object to write. That's all there is to it.

Python Reading From and Writing to Binary Files

3 Answers3