0

I've got a task from my professor and unfortunately I'm really confused.

The task:

Find a string D1 for which hash(D1) contains 4 first bytes equal 0. So it should look like "0000....."

As I know we cannot just decrypt a hash, and checking them one by one is kind of pointless work.

  • Hint: https://www.blockchain.com/btc/block/0000000000000000000329516ce39fddea75359935d7d3f3ceeac02b159fd8fa (look at the hash). – mti2935 May 05 '21 at 15:48
  • Look into [Rainbow tables](https://en.wikipedia.org/wiki/Rainbow_table). – phbits May 05 '21 at 15:55
  • "checking them one by one is kind of pointless work" - exactly. That's why you should write a script to do the work for you! Or, just look at the bitcoin blockchain. Any block you find will have a hash that has four (plus many more) leading zeros. You can use any of these as a solution to your problem. – mti2935 May 05 '21 at 15:57
  • The Python hash() function is not cryptographically secure and should not be considered irreversible. Also, you can easily brute-force it. – amon May 05 '21 at 16:12
  • @mti2935 - technically aren't those leading zeros padding? A "true" hash should rarely start with `0000...`. – phbits May 05 '21 at 16:22
  • 1
    @phbits Exactly! "Rarely" being the key word there; that's exactly why mining bitcoin is so computationally-intensive: it takes a lot of trial&error to find a value that hashes to a value with the correct number of leading zeros. With bitcoin they vary the required number of leading zeros in order to slow down the rate at which people are publishing blocks. This is a neat homework problem because it will give you a good intuition about the underlying hardness problem that keeps bitcoin secure. – Mike Ounsworth May 05 '21 at 16:56
  • 1
    Well said, @MikeOunsworth. – mti2935 May 05 '21 at 17:13
  • https://crypto.stackexchange.com/a/83227/18298 – kelalaka May 05 '21 at 19:08

3 Answers3

1

I've got a task from my professor...

Find a string D1 for which hash(D1) contains 4 first bytes equal 0. So it should look like "0000....."

As I know we cannot just decrypt a hash, and checking them one by one is kind of pointless work.

In this case it seem like the work is not really "pointless." Rather, you are doing this work because your professor asked you to do it.

Some commenters have mentioned that you could look at the bitcoin blockchain as a source of hashes, but this will only work if your hash of interest is the same one use by bitcoin (double-SHA256!)

The easiest way to figure this out in general is just to brute force it:

Pseudo-code a la python

for x in range(10*2**32):  # Any number bigger than about 4 billion should work
    x_str = str(x)  # Any old method to generate some bytes to hash should work
    x_bytes = x_str.encode('utf-8')
    hash_bytes = hash(x_bytes)  # assuming hash() returns bytes 
    if hash_bytes[0:4] == b'\x00\x00\x00\x00':
        print("Found string: {}".format(x_str))
        break
hft
  • 1,245
  • 10
  • 29
1

I wrote a short python3 script, which repeatedly tries hashing random values until it finds a value whose SHA256 hash has four leading zero bytes:

import secrets
import hashlib

while(True):
    p=secrets.token_bytes(64)
    h=hashlib.sha256(p).hexdigest()
    if(h[0:8]=='00000000'): break

print('SHA256(' + p.hex() + ')=' + h)

After running for a few minutes (on my ancient Dell laptop), it found a value whose SHA256 hash has four leading zero bytes:

SHA256(21368dc16afcb779fdd9afd57168b660b4ed786872ad55cb8355bdeb4ae3b8c9891606dc35d9f17c44219d8ea778d1ee3590b3eb3938a774b2cadc558bdfc8d4)=000000007b3038e968377f887a043c7dc216961c22f8776bbf66599acd78abf6

The following command-line command verifies this result:

echo -n '21368dc16afcb779fdd9afd57168b660b4ed786872ad55cb8355bdeb4ae3b8c9891606dc35d9f17c44219d8ea778d1ee3590b3eb3938a774b2cadc558bdfc8d4' | xxd -r -p | sha256sum

As expected, this produces:

000000007b3038e968377f887a043c7dc216961c22f8776bbf66599acd78abf6

Edit 5/8/21

Optimized version of the script, based on my conversation with kelalaka in the comments below.

import secrets
import hashlib

N=0
p=secrets.token_bytes(32)
while(True):
    h=hashlib.sha256(p).digest()
    N+=1
    if(h.hex()[0:8]=='0'*8): break
    p=h

print('SHA256(' + p.hex() + ')=' + h.hex())
print('N=' + str(N))

Instead of generating a new random number in each iteration of the loop to use as the input to the hash function, this version of the script uses the output of the hash function from the previous iteration as the input to the hash function in the current iteration. On my system, this quadruples the number of iterations per second. It found a match in 1483279719 iterations in a little over 20 minutes:

$ time python3 findhash2.py
SHA256(69def040a417caa422dff20e544e0664cb501d48d50b32e189fba5c8fc2998e1)=00000000d0d49aaaf9f1e5865c8afc40aab36354bc51764ee2f3ba656bd7c187
N=1483279719

real    20m47.445s
user    20m46.126s
sys 0m0.088s
mti2935
  • 11,465
  • 3
  • 29
  • 33
  • How long does it take to find one? [SHA-512 - How difficult is it to find a hash digest beginning with at least twelve zeros?](https://crypto.stackexchange.com/a/89691/18298) – kelalaka May 06 '21 at 20:38
  • Hi @kelalaka. Nice analysis. I ran the above script on my laptop and it took about 5 minutes for it to find the match above. – mti2935 May 06 '21 at 22:11
  • I think you are very lucky to find one. I'm running your and I don't expect to output in a meaningful time. – kelalaka May 06 '21 at 22:16
  • Break it after 6 hours. No output. – kelalaka May 07 '21 at 19:21
  • Interesting. I'm not sure why it seems to take so much longer on your system. I ran it again on mine, and it found another match in about 23 minutes: `SHA256(b8826e9c5b54590ce41bb9b8abd836a72ea9a34cbf425995abc49d2ad4506c2c)=00000000407afc5afc46144075b7f124bd89a32d263d55dd9591d2964f241c00`. I have an older Dell laptop with an Intel i7 processor and 16GB of RAM. – mti2935 May 08 '21 at 00:39
  • Could you count the number of steps? – kelalaka May 08 '21 at 00:41
  • 1
    @kelalaka I added a counter. I also made a small change to optimize the script. The optimized script found a match in 1483279719 iterations in a little over 20 minutes. See edit above for more info. OP, aren't you glad you didn't check them one-by-one by hand? – mti2935 May 09 '21 at 01:02
  • Thanks, I'm running it. Is it possible that you have [SHA extensions](https://en.wikipedia.org/wiki/Intel_SHA_extensions)? In 20 minutes you have reached to 2^31 SHA256 calls. In my self-experiment ( above link) my code was reached 2^30 around 22 minutes. It is expected to have 1/2^32 one will hit. The second run not so far. The first one was definitely a bit lucky to hit in 5 minutes. In that lucky one, you find it approximately in 2^29 try. – kelalaka May 09 '21 at 01:50
  • Interesting. I looked at the article that you linked to on SHA extensions, and I don't believe my processor has these. I posted the output of lscpu here: https://pastebin.com/7hdz0i3y. I also checked to see if the script is running in multiple cores, and it is not. – mti2935 May 09 '21 at 11:12
  • My experiment finally finished. 144m25s and took 8362803000 iterations ~2^32.9. Approximately 5.6 times longer than yours and the is almost equal per hash. An Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz is used in this experiment. As we can see new generations are not so important on these, as long as you have one with hardware support. My time for calculating your 1483279719 iterations is ~20m. And those numbers are close to my experiment, too. – kelalaka May 09 '21 at 11:53
  • If you wonder about the time improvement the second version has can calculate 1M hashes in ~7.42 seconds on average of 5 runs. While the first version can execute this in ~50s. There is a great tool named [hyperfine](https://github.com/sharkdp/) to automize the runs and measure the timings. – kelalaka May 09 '21 at 18:00
  • [Optimized version of my code](https://codereview.stackexchange.com/q/247486/185475) can execute in 12 seconds on the average of 5. – kelalaka May 09 '21 at 18:08
  • @kelalaka, Good stuff. Thanks for sharing! I wish there was a way to do bitwise operations on the bytes object returned by hashlib.sha256().digest() in python. This would be a much cleaner (and probably faster) way to count the number of leading zeroes, instead of converting the bytes objects to hexadecimal strings or integers. But, unfortunately, there doesn't seem to be a direct way to do this, according to https://stackoverflow.com/questions/22593822/doing-a-bitwise-operation-on-bytes – mti2935 May 09 '21 at 23:02
  • It should be 10M not 1M. – kelalaka May 09 '21 at 23:07
0

The sha256 hash of the string $Eo is 0000958bc4dc132ad12abd158073204d838c02b3d580a9947679a6

This was found using the code below which restricts the string to only UTF8 keyboard characters. It cycles through the hashes of each 1 character string (technically it hashes bytes, not strings), then each 2 character string, then each 3 character string, then each 4 character string (it never had to go to 4 characters, so I'm not 100% sure the math for that part of the function is correct).

The 'limit" value is included to prevent the code from running forever in case a match is not found. This ended up not being necessary as a match was found in 29970 iterations and the execution time was nearly instantaneous.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from hashlib import sha256

utf8_chars = list(range(0x21,0x7f))

def make_str(attempt):
    if attempt < 94:
        c0 = [attempt%94]
    elif attempt >= 94 and attempt < 8836:
        c2 = attempt//94
        c1 = attempt%94
        c0 = [c2,c1]
    elif attempt >= 8836 and attempt < 830584:
        c3 = attempt//8836
        c2 = (attempt-8836*c3)//94
        c1 = attempt%94
        c0 = [c3,c2,c1]
    elif  attempt >= 830584 and attempt < 78074896:
        c4 = attempt//830584
        c3 = (attempt-830584*c4)//8836
        c2 = ((attempt-830584*c4)-8836*c3)//94
        c1 = attempt%94
        c0 = [c4,c3,c2,c1]
    return bytes([utf8_chars[i] for i in c0])

target = '0000'
limit = 1200000
attempt = 0
hash_value = sha256()
hash_value.update(make_str(attempt))
while hash_value.hexdigest()[0:4] != target and attempt <= limit:
    hash_value = sha256()
    attempt += 1
    hash_value.update(make_str(attempt))
t = ''.join([chr(i) for i in make_str(attempt)])
print([t, attempt])
fuzzydrawrings
  • 283
  • 2
  • 6