25
#!/usr/bin/env python3

import binascii


var=binascii.a2b_qp("hello")
key=binascii.a2b_qp("supersecretkey")[:len(var)]

print(binascii.b2a_qp(var))
print(binascii.b2a_qp(key))


# here I want to do an XOR operation on the bytes in var and key and place them in 'encryption': encryption=var XOR key

print(binascii.b2a_qp(encrypted))

If someone could enlighten me on how I could accomplish this I would be very happy. Very new to the whole data-type conversions so yeah... reading through the python wiki is not as clear as I would like.

Neuron
  • 5,141
  • 5
  • 38
  • 59
Jcov
  • 2,122
  • 2
  • 21
  • 32
  • do you mean xoring the var string against the key string? Mind you they They have different lengths. In python the xor operator is ^ – Pynchia Apr 02 '15 at 08:21
  • So my use of [:len(var)] to cut the key to the same size as the the var string will not work? I thought each character is converted in to a single byte where a=97=01100001 for example. When I use encrypted = var ^ key I get "TypeError: unsupported operand type(s) for ^: 'bytes' and 'bytes'" – Jcov Apr 02 '15 at 08:26

3 Answers3

54

Comparison of two python3 solutions

The first one is based on zip:

def encrypt1(var, key):
    return bytes(a ^ b for a, b in zip(var, key))

The second one uses int.from_bytes and int.to_bytes:

def encrypt2(var, key, byteorder=sys.byteorder):
    key, var = key[:len(var)], var[:len(key)]
    int_var = int.from_bytes(var, byteorder)
    int_key = int.from_bytes(key, byteorder)
    int_enc = int_var ^ int_key
    return int_enc.to_bytes(len(var), byteorder)

Simple tests:

assert encrypt1(b'hello', b'supersecretkey') == b'\x1b\x10\x1c\t\x1d'
assert encrypt2(b'hello', b'supersecretkey') == b'\x1b\x10\x1c\t\x1d'

Performance tests with var and key being 1000 bytes long:

$ python3 -m timeit \
  -s "import test_xor;a=b'abcdefghij'*100;b=b'0123456789'*100" \
  "test_xor.encrypt1(a, b)"
10000 loops, best of 3: 100 usec per loop

$ python3 -m timeit \
  -s "import test_xor;a=b'abcdefghij'*100;b=b'0123456789'*100" \
  "test_xor.encrypt2(a, b)"
100000 loops, best of 3: 5.1 usec per loop

The integer approach seems to be significantly faster.

Neuron
  • 5,141
  • 5
  • 38
  • 59
Vincent
  • 12,919
  • 1
  • 42
  • 64
  • 2
    One might simply use `int.from_bytes(bytes_object, endianness)` to convert a bytes object to an integer directly (and in a saner way). – Bora M. Alper Mar 08 '17 at 07:41
  • 5
    @Czechnology The integer approach seems to be significantly faster. See my edit. – Vincent May 04 '17 at 14:24
  • 1
    This faster version is a neat discovery. Thanks! – Justin Turner Arthur Apr 09 '18 at 08:22
  • Both encrypt and encrypt2 function fails to fully encrypt the 'var' if length of 'key' is less than 'var'. For example, the following function calls encrypt2(b'hello world', b'ab' ) will result in only first two characters to be encrypted: b'\t\x07llo world' – Moiz Jul 22 '20 at 21:23
  • Adding `if len(key) < len(var): key = key * int(len(var)/len(key) + 1)` before `key = key[:len(var)]` will fix the issue – Moiz Jul 22 '20 at 23:01
25

It looks like what you need to do is XOR each of the characters in the message with the corresponding character in the key. However, to do that you need a bit of interconversion using ord and chr, because you can only xor numbers, not strings:

>>> encrypted = [ chr(ord(a) ^ ord(b)) for (a,b) in zip(var, key) ] 
>>> encrypted
['\x1b', '\x10', '\x1c', '\t', '\x1d']

>>> decrypted = [ chr(ord(a) ^ ord(b)) for (a,b) in zip(encrypted, key) ]
>>> decrypted
['h', 'e', 'l', 'l', 'o']

>>> "".join(decrypted)
'hello'

Note that binascii.a2b_qp("hello") just converts a string to another string (though possibly with different encoding).

Your approach, and my code above, will only work if the key is at least as long as the message. However, you can easily repeat the key if required using itertools.cycle:

>>> from itertools import cycle
>>> var="hello"
>>> key="xy"

>>> encrypted = [ chr(ord(a) ^ ord(b)) for (a,b) in zip(var, cycle(key)) ]
>>> encrypted
['\x10', '\x1c', '\x14', '\x15', '\x17']

>>> decrypted = [ chr(ord(a) ^ ord(b)) for (a,b) in zip(encrypted, cycle(key)) ]
>>> "".join(decrypted)
'hello'

To address the issue of unicode/multi-byte characters (raised in the comments below), one can convert the string (and key) to bytes, zip these together, then perform the XOR, something like:

>>> var=u"hello\u2764"
>>> var
'hello❤'

>>> encrypted = [ a ^ b for (a,b) in zip(bytes(var, 'utf-8'),cycle(bytes(key, 'utf-8'))) ]
>>> encrypted
[27, 16, 28, 9, 29, 145, 248, 199]

>>> decrypted = [ a ^ b for (a,b) in zip(bytes(encrypted), cycle(bytes(key, 'utf-8'))) ]
>>> decrypted
[104, 101, 108, 108, 111, 226, 157, 164]

>>> bytes(decrypted)
b'hello\xe2\x9d\xa4'

>>> bytes(decrypted).decode()
'hello❤'
DNA
  • 42,007
  • 12
  • 107
  • 146
  • @DNA - nice! Fails for unicode input though...zip places characters into the tuples, then `chr` gets confused because the unicode character is out of it's range. E.g. `var=u'\u2764'` would cause an exception....❤ – Hamy Sep 23 '17 at 23:28
  • @Hamy You may be able to use `unichr()` instead of `chr()` to fix this, but I haven't tried it yet... – DNA Sep 24 '17 at 20:29
  • @DNA - Good thought, I think it would XOR the wrong data - the two-byte unicode character passed to ord would be xor'ed with a one-byte ascii character with the low bits being combined, when the goal is to treat both `var` and `key` as a byte stream and xor them one-bit at a time. E.g. `bin(ord(u'\u1000'))` is `0b1000000000000` so if I `OR` it with a byte of all `1s` as a stream operation then the high bits should be one, but in reality this happens - `bin(ord('\xFF') | ord(u'\u1000'))` is `0b1000011111111` – Hamy Sep 25 '17 at 23:03
  • 2
    IMO this just underlines how tricky p2 can be for byte operations...the only quick fix I see for this is to double-check that the input is a str not a unicode e.g. `if not isinstance(var, str) or not isinstance(key, str)` – Hamy Sep 25 '17 at 23:04
  • Note that the OP is using Python 3 – DNA Oct 17 '17 at 09:44
2

You can use Numpy to perform faster

import numpy as np
def encrypt(var, key):
    a = np.frombuffer(var, dtype = np.uint8)
    b = np.frombuffer(key, dtype = np.uint8)
    return (a^b).tobytes()
Latze
  • 21
  • 1