0

I have been struggling with this for hours. I have the following production code (parsed out for simplicity) that runs just fine in Python 2.7:

import hashlib
import hmac

string1 = 'firststring'
string2 = 'secondstring'

digest = hmac.new(key=string1, msg=string2, digestmod=hashlib.sha256).digest()

print('hmac_digest = ' + digest) # digest is a string

The output is a string like so:

hmac_digest = �!�Ni��I.u�����x�l*>a?. �

But when I run this with Python3.7, I get the following error:

Traceback (most recent call last):
  File "/home/xxxx/work/py23.py", line 7, in <module>
    digest = hmac.new(key=string1, msg=string2, digestmod=hashlib.sha256).digest()
  File "/usr/lib/python3.7/hmac.py", line 153, in new
    return HMAC(key, msg, digestmod)
  File "/usr/lib/python3.7/hmac.py", line 49, in __init__
    raise TypeError("key: expected bytes or bytearray, but got %r" % type(key).__name__)
TypeError: key: expected bytes or bytearray, but got 'str'

Process finished with exit code 1

After a quite a bit of research I understood that hmac has changed in 3.4 and beyond. Therefore I redid my code to the following:

import hashlib
import hmac
import base64

string1 = 'firststring'
string2 = 'secondstring'

digest = hmac.new(key=string1.encode('utf-8'), msg=string2.encode('utf-8'), digestmod=hashlib.sha256).digest()
digest = base64.encodebytes(digest).decode('utf-8') # need to convert to string

print('hmac_digest = ' + digest)

But the output I get is completely different!

hmac_digest = 5CEZhgMDTmmFxkkudbGPxaLSytl4+gdsKj4PYT8uAJk=

How do I correctly port this code to python3.7 so I get the exact same output as 2.7?

Thanks in advance!

knoxfire
  • 1
  • 3

2 Answers2

0

The issue you're hitting is that in Python 2, strings are effectively just bytes, while in Python 3 strings are unicode strings, and there is a new bytes data type for raw bytes. You can read more about the issues involved in the Python 3 porting guide (and elsewhere).

The minimal set of changes to get your code to work is probably something like this:

import hashlib
import hmac

string1 = 'firststring'.encode('utf-8')
string2 = 'secondstring'.encode('utf-8')

digest = hmac.new(key=string1, msg=string2, digestmod=hashlib.sha256).digest()

print('hmac_digest = ' + repr(digest)) # digest is a string

This will output:

hmac_digest = b'\xe4!\x19\x86\x03\x03Ni\x85\xc6I.u\xb1\x8f\xc5\xa2\xd2\xca\xd9x\xfa\x07l*>\x0fa?.\x00\x99'

We're printing out the repr() of the hmac because it's just a collection of bytes. If you really wanted to print this out you would normally convert it to a hex string:

digest = hmac.new(key=string1, msg=string2, digestmod=hashlib.sha256).hexdigest()

Which would result in:

hmac_digest = 'e421198603034e6985c6492e75b18fc5a2d2cad978fa076c2a3e0f613f2e0099'
larsks
  • 277,717
  • 41
  • 399
  • 399
  • Thanks for the quick response but this still doesn't give me the same output as in Python2.7. How do I get the exact string that Python2 gives me? Also, hexdigest() gives a completely different string that will not work for the module where it will be used. string1 and string2 are the arguments to this module and they have to provide the exact output that python2 provides. – knoxfire May 03 '20 at 02:53
  • The data should be the same, even through the representation is different. If you care about how the string "looks" you should just be asking for the hexdigest, which should be identical in both cases. The raw digest isn't really meant to be printed. " �" isn't terribly meaningful. – larsks May 03 '20 at 02:55
  • Yes its not terribly useful for printing but the raw digest feeds into another module where its parsed (as a string) and converted for further use. I am unable to provide you the receiver module due to security reasons but I will find an alternate way to clarify it. Appreciate the fast answers! – knoxfire May 03 '20 at 03:10
0

Thanks to Josh Lee for his answer in UnicodeDecodeError, invalid continuation byte

His suggestion about using 'latin-1' to decode the digest output solved the problem for me!

Here's how my code looks in Python 3.7 now and gives me the exact same output as my code in Python 2.7:

import hashlib
import hmac

string1 = 'firststring'.encode('utf-8') # can use 'latin-1'
string2 = 'secondstring'.encode('utf-8') # can use 'latin-1' 

digest = hmac.new(key=string1, msg=string2, digestmod=hashlib.sha256).digest()

print('hmac_digest = ' + digest.decode('latin-1')) # Use only 'latin-1' to decode because 'utf-8' or 'ascii' will throw a UnicodeDecodeError
knoxfire
  • 1
  • 3