19

I just found pycrypto today, and I've been working on my AES encryption class. Unfortunately it only half-works. self.h.md5 outputs md5 hash in hex format, and is 32byte. This is the output. It seems to decrypt the message, but it puts random characters after decryption, in this case \n\n\n... I think I have a problem with block size of self.data, anyone know how to fix this?

Jans-MacBook-Pro:test2 jan$ ../../bin/python3 data.py b'RLfGmn5jf5WTJphnmW0hXG7IaIYcCRpjaTTqwXR6yiJCUytnDib+GQYlFORm+jIctest 1 2 3 4 5 endtest\n\n\n\n\n\n\n\n\n\n'

from Crypto.Cipher import AES
from base64 import b64encode, b64decode
from os import urandom

class Encryption():
    def __init__(self):
        self.h = Hash()

    def values(self, data, key):
        self.data = data
        self.key = key
        self.mode = AES.MODE_CBC
        self.iv = urandom(16)
        if not self.key:
            self.key = Cfg_Encrypt_Key
        self.key = self.h.md5(self.key, True)

    def encrypt(self, data, key):
        self.values(data, key)
        return b64encode(self.iv + AES.new(self.key, self.mode, self.iv).encrypt(self.data))

    def decrypt(self, data, key):
        self.values(data, key)
        self.iv = b64decode(self.data)[:16]
        return AES.new(self.key, self.mode, self.iv).decrypt(b64decode(self.data)[16:])
if __name__ is None
  • 11,083
  • 17
  • 55
  • 71

4 Answers4

82

To be honest, the characters "\n\n\n\n\n\n\n\n\n\n" don't look that random to me. ;-)

You are using AES in CBC mode. That requires length of plaintext and ciphertext to be always a multiple of 16 bytes. With the code you show, you should actually see an exception being raised when data passed to encrypt() does not fulfill such condition. It looks like you added enough new line characters ('\n' to whatever the input is until the plaintext happened to be aligned.

Apart from that, there are two common ways to solve the alignment issue:

  1. Switch from CBC (AES.MODE_CBC) to CFB (AES.MODE_CFB). With the default segment_size used by PyCrypto, you will not have any restriction on plaintext and ciphertext lengths.

  2. Keep CBC and use a padding scheme like PKCS#7, that is:

    • before encrypting a plaintext of X bytes, append to the back as many bytes you need to to reach the next 16 byte boundary. All padding bytes have the same value: the number of bytes that you are adding:

      length = 16 - (len(data) % 16)
      data += bytes([length])*length
      

      That's Python 3 style. In Python 2, you would have:

      length = 16 - (len(data) % 16)
      data += chr(length)*length
      
    • after decrypting, remove from the back of the plaintext as many bytes as indicated by padding:

      data = data[:-data[-1]]
      

Even though I understand in your case it is just a class exercise, I would like to point out that it is insecure to send data without any form of authentication (e.g. a MAC).

  • Oh wow, your padding solution is really neat. I went with approach where I added $^End#Something^$ to end of body that is to be encrypted, and than also bunch of # signs till whole body is multiple of 16. Than when Im done decrypting, I rstrip #'s and remove my end string. Anyway, what do you mean with sending data without form of authentication? – if __name__ is None Jan 08 '13 at 04:51
  • 1
    If you mean hmac signing, yes Ive builtin sha256 hmac signing into my encrypt function. Anyway, out of curiosity I tried your padding solution, and it doesn't work. I fiddled with it to the point I got padding to work, but unpadding just deletes everything. def pad2(self, data): if not isinstance(data, bytes): self.data = bytes(self.data, 'utf-8') size = (self.blockSize - len(self.data)) % self.blockSize self.data += bytes(size, 'ascii') return self.data def unpad2(self, data): return data[:-data[-1]] – if __name__ is None Jan 08 '13 at 05:24
  • Yes, HMAC is OK. It's important to have it because serious attacks exist on the padding part (padding oracle attack). In Python/PyCrypto, all data you encrypt *must* always be a binary string, not a text string. That was not important in Python2, but in Python3 they are two different things now. – SquareRootOfTwentyThree Jan 08 '13 at 07:52
  • so you're saying I should never convert encrypted data to str, but keep it in bytes at all times? Why? – if __name__ is None Jan 10 '13 at 04:27
  • 1
    Because plaintexts and ciphertexts are not text strings. They must be a sequence of exactly 16 bytes. If take a text strings, I can *encode* it into bytes in several different ways (UTF-8, UTF-16, ASCII, etc). That distinction is very important in Python3. – SquareRootOfTwentyThree Jan 17 '13 at 06:44
  • @SquareRootOfTwentyThree Really thanks for your answer!! – Danyun Liu Dec 13 '15 at 09:48
  • 4
    Thanks for this neat solution @SquareRootOfTwentyThree ! Just wanted to add that `data[:-data[-1]]` did not work out of the box for me with python 2. I had to use `data[:-ord(data[-1])]` instead. – mmuller Oct 22 '16 at 20:51
  • 1
    @mmuller, that's because in Python 2 the decoded data is a _string_, so the last position is still a string, hence `ord()` is required. In Python 3 the data is already a sequence of _bytes_, so any given position is an `int`. Try `>>> b'ABC'[-1]` in a python 3 and 2 terminal to see the difference – MestreLion Oct 26 '18 at 09:43
  • 1
    @SquareRootOfTwentyThree, question: what if the plaintext is already aligned with a 16-byte boundary? That would lead to 0 padding bytes, and your suggested `data = data[:-data[-1]]` will truncate the message itself, no? – MestreLion Oct 26 '18 at 09:46
  • 1
    Oh, nevermind... an aligned message will lead to **16** bytes of padding, _not_ 0. Clever spec :-) I'll keep my comment anyway as it might help others who stumble on this – MestreLion Oct 26 '18 at 09:49
2

You can use a fix character as long as you remember the length of your initial payload, so you don't "throw" useful end bytes away. Try this:

import base64

from Crypto.Cipher import AES

def encrypt(payload, salt, key):
    return AES.new(key, AES.MODE_CBC, salt).encrypt(r_pad(payload))


def decrypt(payload, salt, key, length):
    return AES.new(key, AES.MODE_CBC, salt).decrypt(payload)[:length]


def r_pad(payload, block_size=16):
    length = block_size - (len(payload) % block_size)
    return payload + chr(length) * length


print(decrypt(encrypt("some cyphertext", "b" * 16, "b" * 16), "b" * 16, "b" * 16, len("some cyphertext")))
1
from hashlib import md5
from Crypto.Cipher import AES
from Crypto import Random
import base64

def derive_key_and_iv(password, salt, key_length, iv_length):
    d = d_i = ''
    while len(d) < key_length + iv_length:
        d_i = md5(d_i + password + salt).digest()
        d += d_i
    return d[:key_length], d[key_length:key_length+iv_length]

def encrypt(in_file, out_file, password, key_length=32):
    bs = AES.block_size
    salt = Random.new().read(bs - len('Salted__'))
    key, iv = derive_key_and_iv(password, salt, key_length, bs)
    cipher = AES.new(key, AES.MODE_CBC, iv)
    #print in_file
    in_file = file(in_file, 'rb')
    out_file = file(out_file, 'wb')
    out_file.write('Salted__' + salt)
    finished = False
    while not finished:
        chunk = in_file.read(1024 * bs)
        if len(chunk) == 0 or len(chunk) % bs != 0:
            padding_length = bs - (len(chunk) % bs)
            chunk += padding_length * chr(padding_length)
            finished = True
        out_file.write(cipher.encrypt(chunk))
    in_file.close()
    out_file.close()

def decrypt(in_file, out_file, password, key_length=32):
    bs = AES.block_size

    in_file = file(in_file, 'rb')
    out_file = file(out_file, 'wb')
    salt = in_file.read(bs)[len('Salted__'):]
    key, iv = derive_key_and_iv(password, salt, key_length, bs)
    cipher = AES.new(key, AES.MODE_CBC, iv)
    next_chunk = ''
    finished = False
    while not finished:
        chunk, next_chunk = next_chunk, cipher.decrypt(in_file.read(1024 * bs))
        if len(next_chunk) == 0:
            padding_length = ord(chunk[-1])
            if padding_length < 1 or padding_length > bs:
               raise ValueError("bad decrypt pad (%d)" % padding_length)
            # all the pad-bytes must be the same
            if chunk[-padding_length:] != (padding_length * chr(padding_length)):
               # this is similar to the bad decrypt:evp_enc.c from openssl program
               raise ValueError("bad decrypt")
            chunk = chunk[:-padding_length]
            finished = True
        out_file.write(chunk)    
    in_file.close()
    out_file.close()

def encode(in_file, out_file):
    in_file = file(in_file, 'rb')
    out_file = file(out_file, 'wb')
    data = in_file.read()
    out_file.write(base64.b64encode(data))    
    in_file.close()
    out_file.close()

def decode(in_file, out_file):
    in_file = file(in_file, 'rb')
    out_file = file(out_file, 'wb')
    data = in_file.read()
    out_file.write(base64.b64decode(data))    
    in_file.close()
    out_file.close()
Abhisheietk
  • 160
  • 5
  • 1
    This is just code without any hints, maybe it is why nobody upvoted. Could worth adding some prose. – jlandercy Jun 08 '18 at 08:02
  • OMG, you made my day. Your (very spurious) comments did include the tip that the padding must the length of padding – Amfasis Sep 01 '20 at 13:23
  • Please note that padding_length should not be zero, if that would be the case make it the size of the key – Amfasis Sep 02 '20 at 12:39
-1

AES.new().encrypt() and .decrypt() take as both input and output strings whose length is a multiple of 16. You have to fix it in one way or another. For example you can store the real length at the start, and use that to truncate the decrypted string.

Note also that while it's the only restriction for AES, other modules (notably in Crypto.PublicKey) have additional restrictions that comes from their mathematical implementation and that shouldn't (in my opinion) be visible to the end user, but are. For example Crypto.PublicKey.ElGamal will encrypt any short string, but if it starts with null characters, they are lost upon decryption.

Armin Rigo
  • 12,048
  • 37
  • 48
  • Aha, I see. I've read somewhere that the length has to do something with 16byte multiplier, didn't really focus it. Thanks for making it clear. I suppose I will implant some sort of ending identifier in self.data and fill remaining to 16 multiplier with garbage data, than when decrypting delete everything from and including ending identifier on. This way I will eliminate the need to store additional length value. – if __name__ is None Jan 06 '13 at 15:48