0

For a personal project, I'm working on implementing SHA-256 in Python 3, without using the hashlib module (since that would defeat the purpose of learning how SHA-256 works). I've been working from the Wikipedia pseudocode, but my code gives incorrect output (compared to the hashlib output). I've been staring at the code for an hour, and besides a headache, I've made no headway on figuring out what I've done wrong.

The code:

#!/usr/bin/env python3

import hashlib
import sys

# ror function taken from http://stackoverflow.com/a/27229191/2508324
def ror(val, r_bits, max_bits=32):
    return ((val & (2**max_bits-1)) >> r_bits%max_bits)|(val << (max_bits-(r_bits%max_bits)) & (2**max_bits-1))

h = [0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19]

k = [0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,
   0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,
   0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,
   0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,
   0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,
   0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,
   0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,
   0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2]

s = sys.stdin.read().encode()
msg = [int(x,2) for c in s for x in '{:08b}'.format(c)]
msg.append(1)
while len(msg) % 512 != 448:
    msg.append(0)
msg.extend([int(x,2) for x in '{:064b}'.format(len(s))])

for i in range(len(msg)//512):
    chunk = msg[512*i:512*(i+1)] # sloth love chunk
    w = [0 for _ in range(64)]
    for j in range(16):
        w[j] = int(''.join(str(x) for x in chunk[32*j:32*(j+1)]),2)
    for j in range(16, 64):
        s0 = ror(w[j-15], 7) ^ ror(w[j-15], 18) ^ (w[j-15] >> 3)
        s1 = ror(w[j-2], 17) ^ ror(w[j-2], 19) ^ (w[j-2] >> 10)
        w[j] = (w[j-16] + s0 + w[j-7] + s1) % 2**32
    work = h[:]
    for j in range(64):
        S1 = ror(work[4], 6) ^ ror(work[4], 11) ^ ror(work[4], 25)
        ch = (work[4] & work[5]) ^ (~work[4] & work[6])
        temp1 = (work[7] + S1 + ch + k[j] + w[j]) % 2**32
        S0 = ror(work[0], 2) ^ ror(work[0], 13) ^ ror(work[0], 22)
        maj = (work[0] & work[1]) ^ (work[0] & work[2]) ^ (work[1] & work[2])
        temp2 = (S0 + maj) % 2**32
        work = [(temp1 + temp2) % 2**32] + work[:-1]
        work[4] = (work[4] + temp1) % 2**32
    h = [(H+W)%2**32 for H,W in zip(h,work)]

print(''.join('{:08x}'.format(H) for H in h))
print(hashlib.sha256(s).hexdigest())

If the implementation was correct, the two outputs would match. Instead, I get this (with input abc):

$ echo -n abc | ./sha256.py
203b1d9016060802fe5ef80436611159de1868b58d44940e3d3979eab5f4d193
ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad

I have thoroughly examined the code, but I do not see any differences between it and the Wikipedia pseudocode. I suspect the error is in the compression loop (for j in range(64):). I've manually debugged and reviewed the state of the program up through initializing the first 16 words of the w array, and it all checks out.

Any help would be greatly appreciated!

  • 1
    When does the code start to behave differently from what you expect? – leovp Jun 01 '16 at 06:54
  • 1
    @leovp Given that I'm trying to learn the workings of the algorithm, I don't have the knowledge necessary to pinpoint exactly where it is going wrong. –  Jun 01 '16 at 06:58
  • Try the cryptography stackexchange. – Will Jun 01 '16 at 07:31
  • 2
    @Will I do not think this would be any better on Crypto.SE than here. It's not a question about the SHA-256 algorithm; it's a question about trying to implement it, which is more about programming than cryptography. –  Jun 01 '16 at 07:33

1 Answers1

0

SHA1 works on bits, not bytes. Therefore, the 64 bit length at the end of the padding is expressed in bits as well; the mistake is in the line

msg.extend([int(x,2) for x in '{:064b}'.format(len(s))])

which should be

msg.extend([int(x,2) for x in '{:064b}'.format(len(s) * 8)])
phihag
  • 278,196
  • 72
  • 453
  • 469