Duplicate Windows Cryptographic Service Provider results in Python w/ Pycrypto

Question

Edits and Updates

3/24/2013:
My output hash from Python is now matching the hash from c++ after converting to utf-16 and stoping before hitting any 'e' or 'm' bytes. However the decrypted results do not match. I know that my SHA1 hash is 20 bytes = 160 bits and RC4 keys can vary in length from 40 to 2048 bits so perhaps there is some default salting going on in WinCrypt that I will need to mimic. CryptGetKeyParam KP_LENGTH or KP_SALT

3/24/2013:
CryptGetKeyParam KP_LENGTH is telling me that my key ength is 128bits. I'm feeding it a 160 bit hash. So perhaps it's just discarding the last 32 bits...or 4 bytes. Testing now.

3/24/2013: Yep, that was it. If I discard the last 4 bytes of my SHA1 hash in python...I get the same decryption results.

Quick Info:

I have a c++ program to decrypt a datablock. It uses the Windows Crytographic Service Provider so it only works on Windows. I would like it to work with other platforms.

Method Overview:

In Windows Crypto API An ASCII encode password of bytes is converted to a wide character representation and then hashed with SHA1 to make a key for an RC4 stream cipher.

In Python PyCrypto An ASCII encoded byte string is decoded to a python string. It is truncated based on empircally obsesrved bytes which cause mbctowcs to stop converting in c++. This truncated string is then enocoded in utf-16, effectively padding it with 0x00 bytes between the characters. This new truncated, padded byte string is passed to a SHA1 hash and the first 128 bits of the digest are passed to a PyCrypto RC4 object.

Problem [SOLVED]
I can't seem to get the same results with Python 3.x w/ PyCrypto

C++ Code Skeleton:

HCRYPTPROV hProv      = 0x00;
HCRYPTHASH hHash      = 0x00;
HCRYPTKEY  hKey       = 0x00;
wchar_t    sBuf[256]  = {0};

CryptAcquireContextW(&hProv, L"FileContainer", L"Microsoft Enhanced RSA and AES Cryptographic Provider", 0x18u, 0);

CryptCreateHash(hProv, 0x8004u, 0, 0, &hHash);
//0x8004u is SHA1 flag

int len = mbstowcs(sBuf, iRec->desc, sizeof(sBuf));
//iRec is my "Record" class
//iRec->desc is 33 bytes within header of my encrypted file
//this will be used to create the hash key. (So this is the password)

CryptHashData(hHash, (const BYTE*)sBuf, len, 0);

CryptDeriveKey(hProv, 0x6801, hHash, 0, &hKey);

DWORD dataLen = iRec->compLen;  
//iRec->compLen is the length of encrypted datablock
//it's also compressed that's why it's called compLen

CryptDecrypt(hKey, 0, 0, 0, (BYTE*)iRec->decrypt, &dataLen);
// iRec is my record that i'm decrypting
// iRec->decrypt is where I store the decrypted data
//&dataLen is how long the encrypted data block is.
//I get this from file header info

Python Code Skeleton:

from Crypto.Cipher import ARC4
from Crypto.Hash import SHA

#this is the Decipher method from my record class
def Decipher(self):

    #get string representation of 33byte password
    key_string= self.desc.decode('ASCII')

    #so far, these characters fail, possibly others but
    #for now I will make it a list
    stop_chars = ['e','m']

    #slice off anything beyond where mbstowcs will stop
    for char in stop_chars:
        wc_stop = key_string.find(char)
        if wc_stop != -1:
            #slice operation
            key_string = key_string[:wc_stop]

    #make "wide character"
    #this is equivalent to padding bytes with 0x00

    #Slice off the two byte "Byte Order Mark" 0xff 0xfe 
    wc_byte_string = key_string.encode('utf-16')[2:]

    #slice off the trailing 0x00
    wc_byte_string = wc_byte_string[:len(wc_byte_string)-1] 

    #hash the "wchar" byte string
    #this is the equivalent to sBuf in c++ code above
    #as determined by writing sBuf to file in tests
    my_key = SHA.new(wc_byte_string).digest()

    #create a PyCrypto cipher object
    RC4_Cipher = ARC4.new(my_key[:16])

    #store the decrypted data..these results NOW MATCH
    self.decrypt = RC4_Cipher.decrypt(self.datablock)

Suspected [EDIT: Confirmed] Causes
1. mbstowcs conversion of the password resulted in the "original data" being fed to the SHA1 hash was not the same in python and c++. mbstowcs was stopping conversion at 0x65 and 0x6D bytes. Original data ended with a wide_char encoding of only part of the original 33 byte password.

RC4 can have variable length keys. In the Enhanced Win Crypt Sevice provider, the default length is 128 bits. Leaving the key length unspecified was taking the first 128 bits of the 160 bit SHA1 digest of the "original data"

How I investigated edit: based on my own experimenting and the suggestions of @RolandSmith, I now know that one of my problems was mbctowcs behaving in a way I wasn't expecting. It seems to stop writing to sBuf on "e" (0x65) and "m"(0x6d) (probably others). So the passoword "Monkey" in my description (Ascii encoded bytes), would look like "M o n k" in sBuf because mbstowcs stopped at the e, and placed 0x00 between the bytes based on the 2 byte wchar typedef on my system. I found this by writing the results of the conversion to a text file.

BYTE pbHash[256];  //buffer we will store the hash digest in 
DWORD dwHashLen;  //store the length of the hash
DWORD dwCount;
dwCount = sizeof(DWORD);  //how big is a dword on this system?


//see above "len" is the return value from mbstowcs that tells how
//many multibyte characters were converted from the original
//iRec->desc an placed into sBuf.  In some cases it's 3, 7, 9
//and always seems to stop on "e" or "m"

fstream outFile4("C:/desc_mbstowcs.txt", ios::out | ios::trunc | ios::binary);
outFile4.write((const CHAR*)sBuf, int(len));
outFile4.close();

//now get the hash size from CryptGetHashParam
//an get the acutal hash from the hash object hHash
//write it to a file.
if(CryptGetHashParam(hHash, HP_HASHSIZE, (BYTE *)&dwHashLen, &dwCount, 0)) {
  if(CryptGetHashParam(hHash, 0x0002, pbHash, &dwHashLen,0)){

    fstream outFile3("C:/test_hash.txt", ios::out | ios::trunc | ios::binary);
    outFile3.write((const CHAR*)pbHash, int(dwHashLen));
    outFile3.close();
  }
}

References:
wide characters cause problems depending on environment definition
Difference in Windows Cryptography Service between VC++ 6.0 and VS 2008

convert a utf-8 to utf-16 string
Python - converting wide-char strings from a binary file to Python unicode strings

PyCrypto RC4 example
https://www.dlitz.net/software/pycrypto/api/current/Crypto.Cipher.ARC4-module.html

Hashing a string with Sha256

http://msdn.microsoft.com/en-us/library/windows/desktop/aa379916(v=vs.85).aspx

http://msdn.microsoft.com/en-us/library/windows/desktop/aa375599(v=vs.85).aspx

my first order of business is to write/print out the results from CryptGetHashParam and CryptGetProvParam to inspect the hash and the generated RC4 key. http://msdn.microsoft.com/en-us/library/windows/desktop/aa380196(v=vs.85).aspx http://msdn.microsoft.com/en-us/library/windows/desktop/aa379947(v=vs.85).aspx — patmo141, Mar 22 '13 at 02:57
One problem is definitely with mbctowcs. It seems that it's transferring an unpredictable (to me) number of bytes into my buffer to be hashed. Sometimes it only grabs 6 of the 33 desc bytes, other times it grabs 9. Very strange. — patmo141, Mar 24 '13 at 01:03
`wchar_t` is usually 2 bytes or more, but it _can_ be only one. For my compiler it is `typedef`-ed as an `int` (4 bytes). You can check what it is for your compiler with `sizeof`. — Roland Smith, Mar 24 '13 at 12:30
it appears to be 2 byes for me as "Roland" in iRec->desc will become "R o l a n d" in sBuf. Not sure why there is not a trailing 0x00 but if I do the "same" in python by decoding the bytes to string using utf-8, then re-encoding them in utf-16, pass that to SHA1, I get the same hash. — patmo141, Mar 24 '13 at 14:44

Roland Smith · Answer 1 · 2013-03-24T15:53:16.837

You can test the size of wchar_t with a small test program (in C):

#include <stdio.h> /* for printf */
#include <stddef.h> /* for wchar_t */

int main(int argc, char *argv[]) {
    printf("The size of wchar_t is %ld bytes.\n", sizeof(wchar_t));
    return 0;
}

You could also use printf() calls in your C++ code to write e.g. iRec->desc and the result of the hash in sbuf to the screen if you can run the C++ program from a terminal. Otherwise use fprintf() to dump them to a file.

To better mimic the behavior of the C++ program, you could even use ctypes to call mbstowcs() in your Python code.

Edit: You wrote:

One problem is definitely with mbctowcs. It seems that it's transferring an unpredictable (to me) number of bytes into my buffer to be hashed.

Keep in mind that mbctowcs returns the number of wide characters converted. In other words, a 33 byte buffer in a multi-byte encoding can contain anything from 5 (UTF-8 6-byte sequences) up to 33 characters depending on the encoding used.

Edit2: You are using 0 as the dwFlags parameter for CryptDeriveKey. According to its documentation, the upper 16 bits should contain the key length. You should check CryptDeriveKey's return value to see if the call succeeded.

Edit3: You could test mbctowcs in Python (I'm using IPython here.):

In [1]: from ctypes import *

In [2]: libc = CDLL('libc.so.7')

In [3]: monkey = c_char_p(u'Monkey')

In [4]: test = c_char_p(u'This is a test')

In [5]: wo = create_unicode_buffer(256)

In [6]: nref = c_size_t(250)

In [7]: libc.mbstowcs(wo, monkey, nref)
Out[7]: 6

In [8]: print wo.value
Monkey

In [9]: libc.mbstowcs(wo, test, nref)
Out[9]: 14

In [10]: print wo.value
This is a test

Note that in Windows you should probably use libc = cdll.msvcrt instead of libc = CDLL('libc.so.7').

Thanks Roland. This is more less exactly what I have done and I can now examine the bytes transfered into sBuf from iRec->desc and I'm using CryptGetHashParam to compare my c++ and python hashes. It's interesting you mention ctypes because the dll's I'm producing I'm using with ctypes. My whole goal is to move away from having to compile different dll's for different platforms. Using ctypes as part of my investigation is a great easy way to help me test. — patmo141, Mar 24 '13 at 13:30
Based on your edit, I realize that my "multibyte" string is just an ASCII encoded string of bytes. I should have been more clear in that description. — patmo141, Mar 24 '13 at 14:51
re: Edit 3. Interesting, your results show that mbctowcs has no problem with the "bad bytes" I'm experiencing. I should note that it's the c++ decryption that IS working. So the truncated byte strings and hashes are what my c++ dll is outputing and because I am successfully decrypting files made by a 3rd party with this dll, it's the correct method. — patmo141, Mar 24 '13 at 17:19
I'm giving you the bounty since your nudges essentially pushed me to where I found the rest of the problems on my own. Thanks for the help/encouragement. — patmo141, Mar 24 '13 at 19:09

Duplicate Windows Cryptographic Service Provider results in Python w/ Pycrypto

Edits and Updates

Quick Info:

Method Overview:

1 Answers1