1

I am currently working on a Python script that will be able to hide all binary files inside itself. It does this by reading all binary data from the targeted file and then stores it in a list inside itself. Then it removes the file to hide it.

Here is my problem: When I store the read bytes in my script file, Python complains that it is not UTF-8-code. Here is a small sample of how the raw data that I read looks like:

ßëM€€Ê yQtm×ßü«WTª¼É[–±Ê

How can I store those bytes without ruining the script? I guess I can store each byte as a code-point instead so that the interpreter accepts it. But how do I tell the write()-I/O function to write bytes as code-points?

user2726067
  • 131
  • 1
  • 7
  • Interesting - what exactly is the use case? Your script fails to run properly after the removal of the file - your script can't run again, right? (Unless it can be re-produced anyway, but then what's the point...) – Jon Clements Aug 28 '13 at 15:48
  • Presumably this is Python 3? How are you storing the bytes, as a list of strings? – Martijn Pieters Aug 28 '13 at 15:52
  • What do you mean "in a list inside itself"? – Mike Vella Aug 28 '13 at 15:55
  • You are also looking at your raw data as if it is representing a text in a certain encoding; presumably you are using a terminal or Windows console to print the binary data, which means it is being interpreted as text via *some* codec. – Martijn Pieters Aug 28 '13 at 15:55

1 Answers1

6

You should encode the binary - for example, using base64 encloding - to turn the bytes into "legitimate characters". Then, when you need the binary information, you convert it back.

See for example this earlier question for some code examples.

A brief sample to get you going:

# assume your bytes came from a file:
bytesIneed = bytearray([234,232,231,188,122,132,145])
import base64

bytesConverted = base64.b64encode(bytesIneed)

print "encoded string: "
print bytesConverted

bytesRecovered = base64.b64decode(bytesConverted)

print "decoded binary: "
for c in bytesRecovered: print(ord(c))

This will return the following output:

encoded string:
6ujnvHqEkQ==
decoded binary:
234
232
231
188
122
132
145

As you can see - the string 6ujnvHqEkQ== can be stored anywhere; and the decoding function turns it back into the binary data you need.

Community
  • 1
  • 1
Floris
  • 45,857
  • 6
  • 70
  • 122
  • Thanks to your help it works now guys. I have tested it on an .mp3, .mp4, .jpg and .txt and it works! I could both hide and reveal and recreate the files! :) – user2726067 Aug 28 '13 at 18:01
  • Just using a `bytearray` will also work. I don't see the need for base64 here? – Martin Tournoij Jan 15 '15 at 14:04
  • The issue is that (according to my understanding of the question) OP wants the data to be stored as characters in the code. Of course you can use five characters per byte as in the first line of my example, but storing the string of base64 instead is more compact. And per the comment that OP left, it is now possible to "hide and reveal" the files which was the purpose. – Floris Jan 15 '15 at 14:20