11

The following code does not seem to read/write binary form correctly. It should read a binary file, bit-wise XOR the data and write it back to file. There are not any syntax errors but the data does not verify and I have tested the source data via another tool to confirm the xor key.

Update: per feedback in the comments, this is most likely due to the endianness of the system I was testing on.

xortools.py:

def four_byte_xor(buf, key):
    out = ''
    for i in range(0,len(buf)/4):
        c = struct.unpack("=I", buf[(i*4):(i*4)+4])[0]
        c ^= key
        out += struct.pack("=I", c)
    return out

Call to xortools.py:

from xortools import four_byte_xor
in_buf = open('infile.bin','rb').read()
out_buf = open('outfile.bin','wb')
out_buf.write(four_byte_xor(in_buf, 0x01010101))
out_buf.close()

It appears that I need to read bytes per answer. How would the function above incorporate into the following as the function above manipulate multiple bytes? Or Does it not matter? Do I need to use struct?

with open("myfile", "rb") as f:
    byte = f.read(1)
    while byte:
        # Do stuff with byte.
        byte = f.read(1)

For an example the following file has 4 repeating bytes, 01020304:

before XOR

The data is XOR'd with a key of 01020304 which zeros the original bytes:

after XOR

Here is an attempt with the original function, in this case 05010501 is the result which is incorrect:

incorrect XOR attempt

Community
  • 1
  • 1
Astron
  • 1,211
  • 5
  • 20
  • 42
  • Meant there are not any syntax errors. Question updated. – Astron Jul 13 '12 at 00:48
  • The problem is that the `four_byte_xor()` function doesn't xor the part of the buffer, if any, that's not a multiple of four bytes (hence its name). What would you like to do with any modulo 4 bytes in the buffer with respect to the `key` which it apparently expects to also be exactly four bytes long? – martineau Jul 13 '12 at 01:04

2 Answers2

3

Here's a relatively easy solution (tested):

import sys
from xortools import four_byte_xor
in_buf = open('infile.bin','rb').read()
orig_len = len(in_buf)
new_len = ((orig_len+3)//4)*4
if new_len > orig_len:
    in_buf += ''.join(['x\00']*(new_len-orig_len))
key = 0x01020304
if sys.byteorder == "little":  # adjust for endianess of processor
    key = struct.unpack(">I", struct.pack("<I", key))[0]
out_buf = four_byte_xor(in_buf, key)
f = open('outfile.bin','wb')
f.write(out_buf[:orig_len]) # only write bytes that were part of orig
f.close()

What it does is pad the length of the data up to a whole multiple of 4 bytes, xor's that with the four-byte key, but then only writes out data that was the length of the original.

This problem was a little tricky because the byte-order of the data for a 4-byte key depends on your processor but is always written with the high-byte first, but the byte order of string or bytearrays is always written low-byte first as shown in your hex dumps. To allow the key to be specified as a hex integer, it was necessary to add code to conditionally compensate for the differing representations -- i.e. to allow the key's bytes can be specified in the same order as the bytes appearing in the hex dumps.

martineau
  • 119,623
  • 25
  • 170
  • 301
  • I received several syntax errors which trying to run this code. – Astron Jul 13 '12 at 02:04
  • @Astron: I won't be surprised, but suspect they're all trivial. I'll fix them when I have a chance a little later. – martineau Jul 13 '12 at 03:07
  • @Astron: Syntax errors are fixed now, but I won't have a chance to test it until later. – martineau Jul 13 '12 at 03:14
  • Some additional feedback: `NameError: name 'out_buf' is not defined` – Astron Jul 13 '12 at 03:22
  • That would did not seem to work either but it may be my data. – Astron Jul 16 '12 at 01:46
  • @Astron: Strange, given that the basic idea is so simple...and it worked in my own testing with buffers of varying lengths. – martineau Jul 16 '12 at 02:20
  • 1
    @Astron: Ah, I was just able to reproduce the 05010501 output you were getting with 01020304 repeated data and a 01020304 key. The problem has to do with endianess. Considering the first 4 bytes of inbuf.bin as a 4 byte integer would result in a value of 0x04030201 on a big endian processor which would need a integer key of that value in order to produce the 00000000 after xor'ing you were expecting -- otherwise you end up with 05010105s in the outfile.bin. – martineau Jul 16 '12 at 02:49
  • That's interesting and great feedback! Just tried reversing it with your code and it worked. I may actually try to use the rounding idea in a follow-up [question](http://stackoverflow.com/q/11494596/666891) as I am having trouble using a binary key on multiple iterations probably due to varying lengths. – Astron Jul 16 '12 at 03:27
  • @Astron: Good to hear. I think I fixed my answer here and will take a look at your follow-up question when I have a chance a little later. – martineau Jul 16 '12 at 03:56
2

Try this function:

def four_byte_xor(buf, key):
    outl = []
    for i in range(0, len(buf), 4):
        chunk = buf[i:i+4]
        v = struct.unpack(b"=I", chunk)[0]
        v ^= key
        outl.append(struct.pack(b"=I", v))
    return b"".join(outl)

I'm not sure you're actually taking the input by 4 bytes, but I didn't try to decipher it. This assumes your input is divisible by 4.

Edit, new function based in new input:

def four_byte_xor(buf, key):
    key = struct.pack(b">I", key)
    buf = bytearray(buf)
    for offset in range(0, len(buf), 4):
        for i, byte in enumerate(key):
            buf[offset + i] = chr(buf[offset + i] ^ ord(byte))
    return str(buf)

This could probably be improved, but it does provide the proper output.

Keith
  • 42,110
  • 11
  • 57
  • 76
  • Replaced the def with your and tried the original function but it appears that I am getting similar results. – Astron Jul 13 '12 at 01:36
  • Could you edit your question to specify more precisely what you are after? Perhaps some example data, input and output? – Keith Jul 13 '12 at 01:39
  • Is it possible for the `bytearray(buf)` to accept binary data? I have asked a new [question](http://stackoverflow.com/q/11494596/666891) based on this function and I am attempting to feed new data for every iteration. That said I removed the `struc.pack()` portion in attempt to feed binary data. Works on the first iteration and then dies for additional data. – Astron Jul 15 '12 at 20:09
  • 1
    Both str and bytearray work fine with binary data. They are strings of binary data. The bytearray is mutable, and allows in-place modification. They are byte streams. You mask is a binary number (more than 8 bits), so it has to converted into a byte sequence to properly align the xor operation. – Keith Jul 16 '12 at 19:02