For a personal project, I'm working on implementing SHA-256 in Python 3, without using the hashlib
module (since that would defeat the purpose of learning how SHA-256 works). I've been working from the Wikipedia pseudocode, but my code gives incorrect output (compared to the hashlib
output). I've been staring at the code for an hour, and besides a headache, I've made no headway on figuring out what I've done wrong.
The code:
#!/usr/bin/env python3
import hashlib
import sys
# ror function taken from http://stackoverflow.com/a/27229191/2508324
def ror(val, r_bits, max_bits=32):
return ((val & (2**max_bits-1)) >> r_bits%max_bits)|(val << (max_bits-(r_bits%max_bits)) & (2**max_bits-1))
h = [0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19]
k = [0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2]
s = sys.stdin.read().encode()
msg = [int(x,2) for c in s for x in '{:08b}'.format(c)]
msg.append(1)
while len(msg) % 512 != 448:
msg.append(0)
msg.extend([int(x,2) for x in '{:064b}'.format(len(s))])
for i in range(len(msg)//512):
chunk = msg[512*i:512*(i+1)] # sloth love chunk
w = [0 for _ in range(64)]
for j in range(16):
w[j] = int(''.join(str(x) for x in chunk[32*j:32*(j+1)]),2)
for j in range(16, 64):
s0 = ror(w[j-15], 7) ^ ror(w[j-15], 18) ^ (w[j-15] >> 3)
s1 = ror(w[j-2], 17) ^ ror(w[j-2], 19) ^ (w[j-2] >> 10)
w[j] = (w[j-16] + s0 + w[j-7] + s1) % 2**32
work = h[:]
for j in range(64):
S1 = ror(work[4], 6) ^ ror(work[4], 11) ^ ror(work[4], 25)
ch = (work[4] & work[5]) ^ (~work[4] & work[6])
temp1 = (work[7] + S1 + ch + k[j] + w[j]) % 2**32
S0 = ror(work[0], 2) ^ ror(work[0], 13) ^ ror(work[0], 22)
maj = (work[0] & work[1]) ^ (work[0] & work[2]) ^ (work[1] & work[2])
temp2 = (S0 + maj) % 2**32
work = [(temp1 + temp2) % 2**32] + work[:-1]
work[4] = (work[4] + temp1) % 2**32
h = [(H+W)%2**32 for H,W in zip(h,work)]
print(''.join('{:08x}'.format(H) for H in h))
print(hashlib.sha256(s).hexdigest())
If the implementation was correct, the two outputs would match. Instead, I get this (with input abc
):
$ echo -n abc | ./sha256.py
203b1d9016060802fe5ef80436611159de1868b58d44940e3d3979eab5f4d193
ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
I have thoroughly examined the code, but I do not see any differences between it and the Wikipedia pseudocode. I suspect the error is in the compression loop (for j in range(64):
). I've manually debugged and reviewed the state of the program up through initializing the first 16 words of the w
array, and it all checks out.
Any help would be greatly appreciated!