20

If I invoke the command from Mac

echo hello | shasum -a 256

or from ubuntu

echo hello | sha256sum

Then I get the following result

5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03  -

I notice there is dash at the end.

But when I use Python hashlib or Java java.security.MessageDigest, they give me the same result as follows:

2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

So, could anyone point out where I got it wrong please?

Thanks.


Python:

>>> import hashlib
>>> hashlib.sha256("hello").hexdigest()

Java:

MessageDigest md = MessageDigest.getInstance("SHA-256");
String text = "hello";
md.update(text.getBytes("UTF-8"));
byte[] digest = md.digest();
StringBuffer sb = new StringBuffer();
for (int i = 0; i < digest.length; i++) {
    sb.append(String.format("%02x", digest[i] & 0xFF))
}
System.out.println(sb.toString());
Turn
  • 6,656
  • 32
  • 41
4af2e9eb6
  • 682
  • 3
  • 7
  • 20
  • 2
    You got it wrong when you stopped investigating. The next logical thing to do was something like `echo hello | od -a` on both systems to make sure `sha256sum` was getting the same input. – David Schwartz Dec 23 '15 at 09:13
  • @DavidSchwartz Or `od -A n -t x1` now that we're dealing with hexadecimals anyway. The input of SHA256 is binary, just like the output. – Maarten Bodewes Dec 23 '15 at 09:19
  • 2
    echo -n hello | sha256sum is 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824 – axiopisty Nov 16 '18 at 15:40
  • Does this answer your question? [Generating a SHA-256 hash from the Linux command line](https://stackoverflow.com/questions/3358420/generating-a-sha-256-hash-from-the-linux-command-line) – uak Dec 10 '22 at 00:49

2 Answers2

31

The echo commands are adding a trailing newline to your string. Try:

hashlib.sha256("hello\n").hexdigest()
Turn
  • 6,656
  • 32
  • 41
  • 1
    Thank you! I did not realise it at all. – 4af2e9eb6 Dec 23 '15 at 09:05
  • Good answer, but doesn't explain the dash. – Maarten Bodewes Dec 23 '15 at 09:13
  • @MaartenBodewes The dash is the file name. Since it was stdin, a dash is printed. – Art Dec 23 '15 at 10:10
  • @MaartenBodewes Oh, sorry. Didn't notice. – Art Dec 23 '15 at 11:17
  • I agree @MaartenBodewes. Do you think that was the crux of the question, though? I thought the OP was just suggesting that the dash might be a clue as to why they were getting a different result. – Turn Dec 23 '15 at 17:59
  • Not the crux probably, but i don't assume that aquacava wants to perform a manual compare each time either. But yes you have got a point there. – Maarten Bodewes Dec 23 '15 at 18:11
  • I see. My assumption was that they were just trying different sha2 implementations to satisfy their own curiosity and verify they gave the same result and was wondering why they weren't. Assumptions! :-) – Turn Dec 23 '15 at 18:13
12

TL;DR this is an extensive answer explaining character and hex encoding, you can skip this and look at the code below

The sha256sum and related commands are adding the dash: - in the output. These commands have been made to show hash values of *files. A single dash simply means that the input was from the standard inpuIt stream (i.e. there is no file name). Unfortunately I don't see an option to suppress the output, so you have to remove it yourself to get to the actual hash value.

So the hash utilities do not only return the hash value. A SHA-256 hash value simply consists of 32 bytes. As humans cannot read binary the binary is displayed using hexadecimals, but the actual value should still be thought of as bytes. The hexadecimal characters are just a representation of those bytes.

The input of hash functions consist of bits or rather bytes as well. This means that any difference in encoding text will mean that the hash value will be different. This is especially tricky when it comes to white-space and end-of-line encoding. Instead of adding a trailing newline it is probably better to suppress it with the -n command line option for the echo command in the case of "hello" though.

Beware that hexadecimals themselves can also be displayed in different ways; you would make sure whitespace is not present and that the comparison is case-insensitive or that the representation of the bytes always uses the same case.

Shell code

Using sha256sum:

echo -n "hello" | sha256sum | tr -d "[:space:]-"

Using OpenSSL command line:

echo -n hello | openssl sha256 -binary | od -An -tx1 | tr -d "[:space:]"

Here od -An -tx1 will show each byte separately, instead of grouping them which may lead to problems with endianness.

tr -d "[:space:] will remove spaces from the hexadecimals as well as the trailing newline. For sha256sum the dash file indicator is also removed (note the - at the end). This way it is possible to perform a textual (case insensitive) compare.

Python code

In Python without the trailing end of line:

print(hashlib.sha256("hello").hexdigest(), end="")

Java code

In the case of Java you should also make sure that the text encoding matches the system default encoding or you may get into trouble. So you should change:

md.update(text.getBytes("UTF-8"));

to

md.update(text.getBytes());

to get to the platform character encoding. If you don't the compare will fail if the encoding of the platform is not compatible with UTF-8 for the string you want to compare.

Maarten Bodewes
  • 90,524
  • 13
  • 150
  • 263
  • Funny, that's the first time that *stripping* the character encoding from the Java program actually makes sense I think. Usually I have to remind developers to add the character encoding... – Maarten Bodewes Dec 23 '15 at 09:12
  • There might be something wrong with me .. I read "TL;DR ... you can skip this and look at the code below" and smiled. I knew, having written such disclaimers so many times, that I *had* to keep reading, while most people wouldn't. Thank you for your detailed answer. – CodeShane Jan 22 '22 at 08:44
  • This recently had me bugged out while figuring out why aws s3 rejected my checksum. By base64 encoding the output I completely skipped noticing that sha256sum encoded the hash in hexadecimal while openssl kept it in binary – myrsnipe Oct 21 '22 at 12:09