0

Is there any simple way for me to read the contents of a binary file as a binary string, turn it into a normal (utf-8) string, do some operations with it, turn it back into a binary string and write it into a binary file? I tried doing something as simple as:

a_file = open('image1.png', 'rb')
text = b''
for a_line in a_file:
    text += a_line
a_file.close()
text2 = text.decode('utf-8')
text3 = text2.encode()
a_file = open('image2.png', 'wb')
a_file.write(text3)
a_file.close()

but I get 'Unicode can not decode bytes in position...'

What am I doing terribly wrong?

Vladimir Shevyakov
  • 2,511
  • 4
  • 19
  • 40

1 Answers1

1

The utf8 format has enough structure that random arrangements of bytes are not valid UTF-8. The best approach would be to simply work with the bytes read from the file (which you can extract in one step with text = a_file.read()). Binary strings (type bytes) have all the string methods you'll want, even text-oriented ones like isupper() or swapcase(). And then there's bytearray, a mutable counterpart to the bytes type.

If for some reason you really want to turn your bytes into a str object, use a pure 8-bit encoding like Latin1. You'll get a unicode string, which is what you are really after. (UTF-8 is just an encoding for Unicode-- a very different thing.)

alexis
  • 48,685
  • 16
  • 101
  • 161
  • 1
    And note, if you settle on a working encoding (e.g. `latin-1`), you don't need to handle the encode/decode yourself in Python 3. Just change `open('image1.png', 'rb')` to `open('image1.png', 'r', encoding='latin-1')`, and for the output, `open('image2.png', 'w', encoding='latin-1')` and you can read and write without bothering to manually `encode`/`decode`; it will have been decoded to `str` for you on read, and will encode the `str` for you on write. – ShadowRanger Oct 17 '15 at 02:33
  • Good point; though opening the files in binary mode makes the code a little more transparent... I'm not sure the OP should be converting to `str` at all. – alexis Oct 17 '15 at 10:12